Genomic prediction across populations

A new paper by Professor David Balding and colleagues has recently been published in PLOS Genetics.

Using Genetic Distance to Infer the Accuracy of Genomic Prediction
Marco Scutari, Ian Mackay, David Balding
Published: September 2, 2016


It was found that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. They illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.



The availability of increasing amounts of genomic data is making the use of statistical models to predict traits of interest a mainstay of many applications in life sciences. Applications range from medical diagnostics for common and rare diseases to breeding characteristics such as disease resistance in plants and animals of commercial interest. We explored an implicit assumption of how such prediction models are often assessed: that the individuals whose traits we would like to predict originate from the same population as those that are used to train the models. This is commonly not the case, especially in the case of plants and animals that are parts of selection programs. To study this problem we proposed a model-agnostic approach to infer the accuracy of prediction models as a function of two common measures of genetic distance. Using data from plant, animal and human genetics, we find that accuracy decays approximately linearly in either of those measures. Quantifying this decay has fundamental applications in all branches of genetics, as it measures how studies generalise to different populations.