Summary: The free, publicly accessible resource leverages data from The Cancer Genome Atlas Network to classify patient tumor samples into molecular subtypes, advancing cancer diagnostics and treatment.

Takeaways:

  1. Revolutionizing Cancer Diagnosis: The resource includes 737 machine learning models based on data from 8,791 TCGA cancer samples, enabling precise classification of 106 cancer subtypes.
  2. Clinical Impact: By bridging TCGA’s data with clinical applications, the tool simplifies tumor subtyping and supports the design of cancer subtype-specific diagnostic assays and therapies.
  3. Collaborative Innovation: Developed by scientists from over a dozen institutions, this resource is the first of its kind, offering a scalable and reproducible solution to integrate molecular insights into cancer care.

A multi-institutional team of scientists has developed a free, publicly accessible resource to aid in classification of patient tumor samples based on distinct molecular features identified by The Cancer Genome Atlas (TCGA) Network.      

The resource comprises classifier models that can accelerate the design of cancer subtype-specific test kits for use in clinical trials and cancer diagnosis. This is an important advance because tumors belonging to different subtypes may vary in their response to cancer therapies.  

The resource is the first of its kind to bridge the gap between TCGA’s immense data library and clinical implementation.

A paper describing the tools published online in Cancer Cell.

Supporting Cancer Subtyping Classification

“TCGA defined molecular subtypes for each major type of cancer. With this resource, we aimed to provide the clinical and scientific communities with the tools to assign a newly diagnosed tumor to one of these established subtypes,” says Peter W. Laird, PhD, the Peter and Emajean Cook Endowed Chair in Epigenetics at Van Andel Institute and the study’s lead corresponding author. “Our new resource will be a powerful asset for creating clinical assays based on the diverse molecular variations between cancers.”

TCGA was a decade-long, National Cancer Institute-led effort to create detailed molecular maps of 33 cancer types. Unlike traditional approaches that define cancers based on the organ or tissue in which they arise, TCGA identified nuanced genomic, epigenomic, proteomic and transcriptomic characteristics that more precisely describe cancer subtypes.


Further Reading


Andrew D. Cherniack, PhD, of the Broad Institute of MIT and Harvard and Kyle Ellrott, PhD, of the Knight Cancer Institute at Oregon Health & Science University also are corresponding authors of the paper, which represents a collaborative effort between scientists from more than a dozen research organizations.

“Since many TCGA molecular subtypes were generated using hundreds or thousands of features from multiple data types, scientists and physicians have asked us for help subtyping their samples,” Cherniack says. “Our resource greatly simplifies this process.”

Leveraging Data from The Cancer Genome Atlas Network

The team created the new resource by leveraging data from 8,791 TCGA cancer samples that represented 26 cancer cohorts and 106 cancer subtypes. They then used existing machine learning tools to develop and test nearly half a million models across six categories—gene expression, DNA methylation, miRNA, copy number, mutation calls and multi-omics—and selected those that performed best for inclusion in the online resource.   

In total, the resource contains 737 ready-to-use models, which represent the top models from each of the 26 cancer cohorts, the five training algorithms and six data types.  

“A major element of this effort was working to ensure that these models could be deployed by other groups onto new datasets,” Ellrott says. “All too often this type of work is difficult to replicate or apply to new samples.”

Co-first authors of the study include Christopher K. Wong of University of California, Santa Cruz, Christina Yau of University of California, San Francisco, and Buck Institute for Research on Aging, Mauro A. A. Castro of the Federal University of Paraná, Jordan E. Lee of Oregon Health and Science University, Brian J. Karlberg of Oregon Health and Science University, Jasleen K. Grewal of BC Cancer, Vincenzo Lagani of JADBio Gnosis DA and Ilia State University, and Bahar Tercan of the Institute for Systems Biology.