Polygenic scores —estimates of an individual’s predisposition for complex traits and diseases—hold promise for identifying patients at risk of disease and guiding early, personalized treatments, but UCLA experts found the scores fail to account for the wide range of genetic diversity across individuals in all ancestries.
“Polygenic scores can estimate the likelihood of an individual having a certain trait by pulling together and analyzing the small effects of thousands to millions of common genetic variants into a single score, but their performance among individuals from diverse genetic backgrounds is limited,” says Bogdan Pasaniuc, PhD, a UCLA Health expert in statistical and computational methods for understanding genetic risk factors for common diseases.
The researchers’ analysis, published in Nature, shows that the accuracy of polygenic scores (PGSs) varies between individuals across a continuum of genetic ancestry—and this is true even in populations that are traditionally considered as ‘homogeneous,’ (e.g., Europeans) says Pasaniuc, the paper’s senior author.
Assessing polygenic scores performance has commonly been done at the “population” level, such as in “Europeans,” clumping individuals of similar ancestries in a genetic-ancestry cluster, the authors said.
“Imposing artificial boundaries onto this continuum and ignoring the diversity, or ‘heterogeneity,’ within clusters can obscure variation within a group, conceal the similarities that may exist in individuals in different groups, and leave out individuals who do not fit neatly into a particular genetic ancestry,” said Yi Ding, a graduate student in bioinformatics at UCLA, a member of the Pasaniuc Lab, and the paper’s first author.
To provide a more precise estimate of PGS accuracy, the researchers developed a method to evaluate PGS accuracy at the individual level. To test it, they applied PGSs for 84 complex traits to data from more than 35,000 individuals in the UCLA ATLAS Precision Health Biobank, one of the most diverse biobanks in the world, in part because the Los Angeles area is home to one of the most ancestrally diverse populations globally.
The new tool’s “training” data came from a subset of individuals in the UK Biobank in the United Kingdom. As a substitute for discrete genetic ancestries, a continuous metric of “genetic distance” was used to establish the position of each individual in the ATLAS database on the genetic-ancestry continuum, essentially showing how similar or dissimilar a target (ATLAS) individual’s genome was to that from the UK training population.
“We found that the more dissimilar—or genetically ‘distant’—a target individual’s genome was from the UK Biobank training data, the lower the accuracy of the PGS,” Ding said.
The accuracy of polygenic scores declined as genetic distance became greater even when the researchers looked specifically at genetic-ancestry groupings that have been considered homogeneous, such as among individuals of European genetic ancestries. Conversely, some individuals not identified with European ancestry could have higher levels of genetic similarity, showing that PGS performance could differ between two individuals from the same ancestry but be comparable for two people from different ancestries – depending on their genetic similarity.
“Our genetic-distance metric outperformed discrete clustering in identifying individuals who could benefit from PGSs,” said Pasaniuc, a researcher at the David Geffen School of Medicine at UCLA and the UCLA HealthInstitute for Precision Health.
The research team identified several factors—subjects for ongoing and future studies—that could impact PGS accuracy and usefulness, especially in people with “admixed” ancestries. These are usually defined as individuals with recent ancestry from two or more continental sources—such as African Americans and Latinos.
Pasaniuc, whose research focuses on improving genetic risk factor predictions for people with admixed ancestry, said these individuals have “mosaic” genomes, with segments of different continental ancestries at every region. With different portions contributed by different ancestries, it is extremely difficult to accurately classify these individuals using conventional ancestry labels.
“For PGSs to be equitably used,” he says. “the assessment of PGS accuracy should account for the full spectrum of genetic diversity.”