An artificial intelligence (AI) model that can diagnose COVID-19, while also preserving the privacy of patient data, was developed by researchers in the UK and China.

The researchers based their model on more than 9,000 CT scans from approximately 3,300 patients in 23 hospitals in the UK and China. Their results, reported in the journal Nature Machine Intelligence, provide a framework where AI techniques can be made more trustworthy and accurate, especially in areas such as medical diagnosis where privacy is vital.

The international team, led by the University of Cambridge and the Huazhong University of Science and Technology, used a technique called federated learning to build their model. Using federated learning, an AI model in one hospital or country can be independently trained and verified using a dataset from another hospital or country, without data sharing.

“AI has a lot of limitations when it comes to COVID-19 diagnosis, and we need to carefully screen and curate the data so that we end up with a model that works and is trustworthy,” says study co-author Hanchen Wang from Cambridge’s Department of Engineering. “Where earlier models have relied on arbitrary open-sourced data, we worked with a large team of radiologists from the NHS and Wuhan Tongji Hospital Group to select the data, so that we were starting from a strong position.”

The researchers used two well-curated external validation datasets of appropriate size to test their model and ensure that it would work well on datasets from different hospitals or countries.

“Before COVID-19, people didn’t realize just how much data you needed to collect in order to build medical AI applications,” says co-author Michael Roberts, PhD, from AstraZeneca and Cambridge’s Department of Applied Mathematics and Theoretical Physics. “Different hospitals, different countries all have their own ways of doing things, so you need the datasets to be as large as possible in order to make something that will be useful to the widest range of clinicians.”

The researchers based their framework on three-dimensional CT scans instead of two-dimensional images. They used 9,573 CT scans from 3,336 patients collected from 23 hospitals located in China and the UK.

The researchers also had to mitigate for bias caused by the different datasets, and used federated learning to train a better generalized AI model, while preserving the privacy of each data center in a collaborative setting.

For a fair comparison, the researchers validated all the models on the same data, without overlapping with the training data. The team had a panel of radiologists make diagnostic predictions based on the same set of CT scans, and compared the accuracy of the AI models and human professionals.

The researchers say their model is useful not just for COVID-19, but for any other diseases that can be diagnosed using a CT scan.

“The next time there’s a pandemic, and there’s every reason to believe that there will be, we’ll be in a much better position to leverage AI techniques quickly so that we can understand new diseases faster,” says Wang.

The researchers are now collaborating with the newly-established WHO Hub for Pandemic and Epidemic Intelligence, to explore the possibility of advancing the privacy-preserving digital healthcare frameworks.