Researchers are developing a large-scale olfactory dataset to train artificial intelligence, potentially unlocking smell as a powerful tool for early disease detection.
By Alyx Arnett
It’s long been recognized that certain diseases carry distinct odors. Yet, the diagnostic potential of smell has remained largely untapped in modern medicine, relegated to anecdotal observations rather than standardized clinical practice. Now, a new initiative aims to change that by systematically teaching artificial intelligence (AI) to recognize the molecular patterns of smell, creating a pathway for a new generation of noninvasive diagnostics.
Researchers have created oMNIST, the first large-scale dataset designed to train AI models in the science of olfaction. The project’s goal is to establish a foundational resource that can accelerate the development of smell-based tests for a wide range of health conditions. By creating a standardized benchmark, oMNIST could enable the global research community to develop and compare algorithms, much like landmark datasets did for computer vision decades ago.
The effort is driven by the promise of earlier disease detection. Many conditions, from metabolic disorders to neurodegenerative diseases, produce subtle changes in the body’s volatile organic compounds long before clinical symptoms appear.
“Smell offers a new source of data in health care,” says Vasant Dhar, a professor at NYU Stern and a co-creator of oMNIST. “Such data can provide an early warning system, the smell test, before visible signs of disease appear….By the time detection occurs, it is often too late.”
Standardizing Data to Accelerate Discovery
A primary obstacle to advancing olfactory diagnostics has been the lack of standardized, shareable data. According to Dhar, individual research studies are often “bespoke,” with data collected in ways that cannot be easily compared or combined. This fragmentation has slowed progress and made it difficult to validate findings across different research groups.
oMNIST is designed to solve this problem by providing a common ground for innovation. “Progress can be much faster when data are pooled and available to the global research community,” Dhar says.
He compares the current state of olfaction research to computer vision 10 to 20 years ago, before the creation of massive image datasets like MNIST (recognizing digits) and Imagenet (many kinds of images). Those resources spurred intense competition and led to breakthroughs, including the development of powerful neural network architectures that now define the field of AI. By creating a similar resource for smell, the oMNIST team hopes to ignite a comparable wave of innovation in diagnostics.
Clinical Potential and Laboratory Challenges
The potential applications for smell-based diagnostics are vast, with some of the most promising research focused on neurodegenerative and autoimmune diseases. “We know for sure that autoimmune diseases like Parkinson’s can be detected by smell by humans long before they are diagnosed—sometimes over ten years in advance,” Dhar says. “There’s no reason to believe that numerous other conditions cannot be similarly detected.”
From a clinical laboratory perspective, these diagnostics could be integrated into existing workflows using common sample types. “They are all relevant: urine, blood, sweat, and saliva,” Dhar says. “They carry different indicators of health within the body.” This flexibility suggests that future smell-based tests could be developed for various collection methods, depending on which biomarkers prove most effective for specific diseases.
However, significant technical hurdles remain. Accuracy and reproducibility are paramount in diagnostics, and the research is still in its early stages. “At the moment, we are dealing with more basic issues, such as how to pool data across subjects,” says Dhar. His team is currently working with mice and tackling fundamental challenges like aligning neural responses across different animals and removing signal noise from recording equipment.
Before smell-based diagnostics can be considered for regulatory review and reimbursement, the underlying science must be solidified. Dhar emphasizes that “a lot” needs to happen first. “Once we establish a sound science of smell, meaning good predictive models, we will need to standardize data collection and analysis,” he says.
This foundational work is the next step on the long road to bringing the diagnostic power of smell into the clinical lab.
Photo caption: Vasant Dhar
Photo provided: Vasant Dhar