New Method for Large-Scale Data Integration & Biomarker Identification Proposed

A research team has proposed a new algorithm, named NetMoss, for the integration of large-scale microbiome data and biomarker identification.

The study was published in Nature Computational Science, and the research team was led by Professor Fangqing Zhao from the Beijing Institute of Life Sciences of the Chinese Academy of Sciences.

The relationship between gut microbiome and human health has received increasing attention in recent years, and a huge amount of data, which is complex in type and large in quantity, has accumulated with unprecedented growth. However, it is challenging to extract information closely related to disease from such big data.

On the one hand, the gut microbiome is more likely to be influenced by various factors such as diet and geography. The composition of gut microbiome may vary greatly among different populations, which leads to bias in the direct integration of data and the identification of biomarkers based on abundance. On the other hand, the microbial abundance matrix is too sparse, and it is difficult for conventional computational methods to remove batch effects based on this sparse matrix.

More on the New Algorithm

The newly proposed algorithm uses microbial interaction networks to effectively integrate data from different populations. It can quantify the topological differences between different network modules by comparing the perturbations of microbial networks in different states, thus enabling the identification of disease-associated biomarkers.

Compared with previous methods, NetMoss can unbiasedly integrate different batches of microbial data more efficiently, mine disease-associated biomarkers, and identify microbial dysbiosis covariation patterns that drive the occurrence of multiple diseases.

In this study, the researchers collected 11,377 sequencing samples of gut microbiome from diseased and healthy controls, covering 78 studies, 37 diseases, and 13 countries or regions. With these multiple datasets from different populations, they found that currently used computational methods have extreme difficulty removing batch effects caused by experimental and sequencing processes.

To efficiently perform downstream analyses and avoid bias, the researchers developed an efficient computational model for data integration and biomarker identification. The model was based on microbial interaction networks.

Microbial interaction networks are constructed individually and then integrated using different weights based on their structural characteristics. By quantifying the topological differences between different modules in diseased and healthy networks, the bacteria most sensitive to perturbation by external influences are identified as biomarkers.

The researchers applied the computational algorithm to both simulated and real datasets. They found it was highly accurate and robust both in the integrated dataset and in the single dataset.

“Most of the biomarkers did not cause only one disease alone, but were significantly associated with multiple diseases. The similar dysbiosis pattern may provide important clues to the occurrence of different diseases,” says Zhao.

This new algorithm will help researchers understand the nature of microbiome-host interactions and better guide the prevention and treatment of many diseases.

New Method for Large-Scale Data Integration & Biomarker Identification Proposed

More on the New Algorithm

Recent Posts

Clinical Lab Products

Our Parent Company

Key Resources

Helpful Links

New Method for Large-Scale Data Integration & Biomarker Identification Proposed

More on the New Algorithm

Related Posts

SARS-CoV-2 Saliva Test Now Available

QIAGEN Submits CDx to FDA to Guide Treatment Decisions for Investigational NSCLC Drug

Clinical Results Demonstrate High Sensitivity of Trovagene’s Urine-Based HPV Assay

New ChromaCode Collaboration to Improve NSCLC Biomarker Testing Access

Recent Posts