New study reveals diagnostic disparities across demographic groups, prompts development of bias-reduction framework.


Researchers at Harvard Medical School have discovered that artificial intelligence (AI) models used to analyze pathology samples can infer patient demographic information, leading to biased cancer diagnoses across different populations.

The study, published in Cell Reports Medicine, analyzed four major pathology AI models designed to diagnose cancer and found unequal performance in detecting and differentiating cancers based on patients’ self-reported gender, race, and age.

“Reading demographics from a pathology slide is thought of as a ‘mission impossible’ for a human pathologist, so the bias in pathology AI was a surprise to us,” says Kun-Hsing Yu, associate professor of biomedical informatics in the Blavatnik Institute at HMS and HMS assistant professor of pathology at Brigham and Women’s Hospital, in a release.

The research team fed the AI models a large, multi-institutional repository of pathology slides spanning 20 cancer types. All four models demonstrated biased performance, providing less accurate diagnoses for patients in specific demographic groups. The models struggled to differentiate lung cancer subtypes in African American and male patients, and breast cancer subtypes in younger patients. Performance disparities occurred in approximately 29% of the diagnostic tasks conducted.

Three Sources of Bias Identified

The researchers identified three explanations for the bias. Unequal sample sizes in training data make it harder for models to diagnose samples from underrepresented groups. Differential disease incidence means some cancers are more common in certain groups, so models become better at diagnosing those populations. Additionally, AI models detect subtle molecular differences in samples from different demographic groups, potentially learning signals more related to demographics than disease.

“We found that because AI is so powerful, it can differentiate many obscure biological signals that cannot be detected by standard human evaluation,” says Yu in a release. “As a result, the models may learn signals that are more related to demographics than disease.”

FAIR-Path Framework Reduces Bias

To address these issues, the team developed FAIR-Path, a framework based on contrastive learning that teaches models to emphasize differences between cancer types while downplaying differences between demographic groups. When applied to the tested models, FAIR-Path reduced diagnostic disparities by approximately 88%.

“We show that by making this small adjustment, the models can learn robust features that make them more generalizable and fairer across different populations,” says Yu in a release.

The finding suggests bias can be reduced without training models on completely representative data, which Yu notes is encouraging for improving AI fairness in medical applications.

The research team is now collaborating with institutions worldwide to investigate pathology AI bias in different demographics and clinical settings. They are also exploring ways to extend FAIR-Path to settings with limited sample sizes and investigating how AI bias contributes to demographic discrepancies in healthcare outcomes.

“I think there’s hope that if we are more aware of and careful about how we design AI systems, we can build models that perform well in every population,” says Yu in a release.

The work was supported by the National Institute of General Medical Sciences and the National Heart, Lung, and Blood Institute at the National Institutes of Health, among other funding sources.

Photo caption: Pathology images of human tissue samples

Photo credit: The Cancer Genome Atlas

We Recommend for You: