Cancers begin with abnormal changes in individual cells, and the ability to track the accumulation of mutations at the single-cell level can shed new light on the early stages of the disease. Such knowledge could enable more effective early detection and treatment options for patients as well as more accurate predictions of disease progression.
According to a paper in Nature Communications, a team of Rice University researchers led by Luay Nakhleh has developed a platform for integrating DNA and RNA data from single-cell sequencing with greater speed and precision than more recent, state-of-the-art technologies. The method, mapping cross domain nucleic acid or MaCroDNA, relies on a classical algorithm to identify matching pairs of data from DNA ⎯ the genetic blueprint of a cell ⎯ and RNA ⎯ a cell’s instruction manual for protein assembly.
“Imagine you are given two large sets of photos of cars with the license plates and other identifying features blurred,” says Mohammadamin Edrisi, a Rice PhD student in computer science and lead author on the study. “One set contains photos of the cars taken from the front, while the other set has photos of the back of the cars, and someone asks you to find the pairs of photos that belong to the same car. This is a metaphor for the problem we have tried to solve. The cars are cancer cells, and the two sets of photos are DNA and RNA data measurements.”
In fact, the scenario that MaCroDNA is designed to address is more complex than that.
“In a typical cancer single-cell sequencing experiment, the DNA and RNA data sets are obtained from different cells in the tumor sample,” says Nakhleh, the senior author on the study. “So the matching in such a scenario happens between cells that we know are not the same cells.
“To continue the analogy, think of each photo as being taken of the front or back of a different Toyota car, and we want to match pairs of photos that belong to a car of the same model — the front and back of a Toyota Camry, of a Toyota Corolla, etc. Different car models here are analogous to different clones within a heterogenous tumor, where each clone is expected to have very similar, yet not completely identical, DNA and RNA signatures across all cells within the clone.”
Further reading: Why ESR1 Mutation Monitoring Can Keep One Step Ahead of Breast Cancer
Single-cell sequencing has developed significantly over the past decade, driving discovery across various fields of biology. This sequencing technique is an effective tool for studying how changes at the level of the genetic code impact cells’ makeup or functioning, making it easier to track the types of transformations that turn a population of healthy cells into malignant tissue.
“Cancer cells demonstrate abnormal RNA patterns, and one of the reasons for that is DNA mutations,” Edrisi said.
In their quest to identify the best tool for the task, the researchers tested a variety of methods against a real biological dataset with known matching DNA-RNA pairs.
“We tested the state-of-the-art method⎯named clonealign⎯and the other widely used methods using a real dataset with ground truth information for accuracy measurement,” Edrisi says. “Interestingly, using this dataset was one of the novelties in our work. Previous studies relied on simulated data for accuracy measurements, even though there is no scientific consensus as to how to go about simulating such data.”
Of the different machine learning technologies they tested, the researchers found that using a classical correlation coefficient and the maximum weighted bipartite matching algorithm yielded the most accurate results. In other words, MaCroDNA outperformed clonealign by a significant margin.
“The surprising part of our work was that using the classical correlation instead of clonealign’s complicated formula and incorporating it in an algorithm from the 1950s led to the best accuracy we have ever witnessed,” Edrisi says. “The lesson is that we should never judge an algorithm based on its complexity. Give it a shot, and make sure it is compared to the others in a fair setting.”
The method is available for use in cancer research on the role of DNA-RNA dynamics in the emergence of cancer.
Nakhleh is the William and Stephanie Sick Dean of Rice’s George R. Brown School of Engineering and a professor of computer science and biosciences.
The research was supported in part by the National Science Foundation (1812822, 2106837)
Featured image: Mohammadamin Edrisi is a Rice University graduate student and lead author on a paper published in Nature Communications. Photo by Jeff Fitlow/Rice University