SelfDecode released the results of a study it performed using popular tools for genetic phasing and imputation to hone in on the factors that maximize imputation accuracy.

Genotype imputation is an important process within genetic analysis and risk estimation. DNA chips typically read only about 500,000 – 700,000 variants of the more than 3 billion base pairs in the human genome. This leaves many gaps in the information that most commercial DNA chips provide. Using genotype imputation, those gaps can be filled in, which is important for downstream applications, such as estimating one’s genetic risk of heart disease.

“How well we can impute a person’s genome depends on a number of factors, including the reference populations used, phasing (which is how the data is prepared), and which software or tools end up being used,” says Puya Yazdi, MD, chief science officer at SelfDecode.  “That is why our research and development team tested, compared, and benchmarked cutting-edge phasing and imputation software against different chips, data preparation methods, and different reference datasets. We tested 144 combinations in total!”

The goal of the study was to identify methods to maximize imputation accuracy. The imputation software used in the study included Beagle5.4, Impute5, ShapeIT4, Minimac4, and Eagle2.4.1.

The SelfDecode team has also compared imputation accuracy metrics, with the goal of understanding which are the most reliable and how they can be used in different scenarios. 

As a result of the study, SelfDecode has created a processing and comparison pipeline that can help researchers design better chips by choosing SNPs that maximize phasing and imputation accuracy.

Additionally, researchers can use this information to choose the best combination of phasing and imputation tools for their chip, datasets, and computational needs in order to produce optimal results.

“Most importantly, we’ve found that all of the current state-of-art tools have limits and drawbacks. For example, they are not accurate enough to impute rare and ultra-rare variants. Our team is working on overcoming some of these limits,” says Yazdi. “We are currently working on our in-house imputation tool by employing the latest scientific advances, including AI and machine learning.”

Featured image: SelfDecode DNA Test Kit. Photo: SelfDecode