Polygenic score
A polygenic score, also called a polygenic risk score, genetic risk score, or genome-wide score, is a number based on variation in multiple genetic loci and their associated weights (see regression analysis).[1][2] It serves as the best prediction for the trait that can be made when taking into account variation in multiple genetic variants.
Polygenic scores are widely employed in animal, plant, and behavioral genetics for prediction and understanding genetic architectures. In a genome-wide association study (GWAS), polygenic scores having substantially higher predictive performance than the genome-wide statistically-significant hits indicates that the trait in question is affected by a larger number of variants than just the hits and larger sample sizes will yield more hits; a conjunction of low variance explained and high heritability as measured by GCTA, twin studies or other methods indicates that a trait may be massively polygenic and affected by thousands of variants. Once a polygenic score explaining at least a few percent of variance has been created which effectively identifies most of the genetic variants affecting a trait, it can be used as a lower bound to test whether heritability estimates may be biased, measure the genetic overlap of traits (genetic correlation) which might indicate eg shared genetic bases for groups of mental disorders, used to measure group differences in a trait such as height, examine changes in a trait over time due to natural selection indicative of a soft selective sweep such as intelligence (where the changes in frequency would be too small to detect on each individual hit but the polygenic score declines), used in Mendelian randomization (assuming no pleiotropy with relevant traits), detect & control for the presence of genetic confounds in outcomes (eg the correlation of schizophrenia with poverty), and investigate gene–environment interactions.
Polygenic scores are widely used in animal breeding (usually termed genomic prediction) due to their practical use in breeding improved livestock and crops.[3] Their use in human studies are increasing.[4][5]
Estimating weights
Weights are usually estimated using some form of regression analysis. Because the number of genomic variants (usually SNPs) is usually larger than the sample size, one cannot use OLS multiple regression (p > n problem[6][7]). Instead, researchers have opted to use other methods including regressing variants one at a time (usually used in studies with human data). Due to concerns about weakening predictive power, polygenic scores can be constructed by multiple-testing different sets of SNPs selected at various thresholds, such as all SNPs which are genome-wide statistically-significant hits or all SNPs p<0.05 or all SNPs with p<0.50, and the one with greatest performance used for further analysis; especially for highly polygenic traits, the best polygenic score will tend to use most or all SNPs.[8]
The standard GWAS regression can be improved on using penalized regression methods like the LASSO/ridge regression.[1] (Penalized regression can be interpreted as placing informative priors on how many genetic variants are expected to affect a trait, and the distribution of their effect sizes; Bayesian counterparts exist for LASSO/ridge, and other priors have been suggested & used. They can perform better in some circumstances.[9]) A multi-dataset, multi-method study[7] found that of 15 different methods compared across four datasets, minimum redundancy maximum relevance was the best performing method. Furthermore, variable selection methods tended to outperform other methods. Variable selection methods do not use all the available genomic variants present in a dataset, but attempt to select an optimal subset of variants to use. This leads to less overfitting but more bias (see bias-variance tradeoff).
Predictive validity
The benefit of polygenic score is that they can be used to predict the future. This has large practical benefits for animal breeding because it increases the selection precision and allows for shorter generations, both of which speed up evolution.[10][3] For humans, it can be used to predict future disease susceptibility and for embryo selection.[4][11]
Some accuracy values are given below for comparison purposes. These are given in terms of correlations and have been converted from explained variance if given in that format in the source.
In humans
- In 2016, r ≈ 0.30 for educational attainment variation at age 16.[5] This polygenic score was based off the a GWAS using data from 293k persons.[12]
- In 2016, r ≈ 0.31 for case/control status for first-episode psychosis.[13]
In non-human animals
- In 2016, r ≈ 0.30 for variation in milk fat%.[14]
- In 2014, r ≈ 0.18 to 0.46 for various measures of meat yield, carcass value etc.[15]
In plants
- In 2015, r ≈ 0.55 for total root length in Maize (Zea mays L.).[16]
- In 2014, r ≈ 0.03 to 0.99 across four traits in barley.[17]
References
- 1 2 de Vlaming, Ronald; Groenen, Patrick J. F. (2015). "The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics". BioMed Research International. 2015: 1–18. doi:10.1155/2015/143712.
- ↑ Dudbridge, Frank (2013-03-21). "Power and Predictive Accuracy of Polygenic Risk Scores". PLOS Genet. 9 (3): e1003348. ISSN 1553-7404. PMC 3605113 . PMID 23555274. doi:10.1371/journal.pgen.1003348.
- 1 2 Spindel, Jennifer E.; McCouch, Susan R. (2016-09-01). "When more is better: how data sharing would accelerate genomic selection of crop plants". New Phytologist. 212: 814–826. ISSN 1469-8137. PMID 27716975. doi:10.1111/nph.14174.
- 1 2 Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F. (2015-07-15). "Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models". Human Molecular Genetics. 24 (14): 4167–4182. ISSN 0964-6906. PMC 4476450 . PMID 25918167. doi:10.1093/hmg/ddv145.
- 1 2 Selzam, S.; Krapohl, E.; von Stumm, S.; O'Reilly, P. F.; Rimfeld, K.; Kovas, Y.; Dale, P. S.; Lee, J. J.; Plomin, R. (2016-07-19). "Predicting educational achievement from DNA". Molecular Psychiatry. ISSN 1476-5578. PMID 27431296. doi:10.1038/mp.2016.107.
- ↑ James, Gareth (2013). An Introduction to Statistical Learning: with Applications in R. Springer. ISBN 978-1461471370.
- 1 2 Haws, David C.; Rish, Irina; Teyssedre, Simon; He, Dan; Lozano, Aurelie C.; Kambadur, Prabhanjan; Karaman, Zivan; Parida, Laxmi (2015-10-06). "Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods". PLOS ONE. 10 (10): e0138903. ISSN 1932-6203. PMC 4595020 . PMID 26439851. doi:10.1371/journal.pone.0138903.
- ↑ Ware et al 2017, "Heterogeneity in polygenic scores for common human traits"
- ↑ Gianola & Rosa 2015, "One Hundred Years of Statistical Developments in Animal Breeding"
- ↑ Heslot, Nicolas; Jannink, Jean-Luc; Sorrells, Mark E. (2015-01-02). "Perspectives for Genomic Selection Applications and Research in Plants". Crop Science. 55 (1). ISSN 0011-183X. doi:10.2135/cropsci2014.03.0249.
- ↑ Shulman, Carl; Bostrom, Nick (2014-02-01). "Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer?". Global Policy. 5 (1): 85–92. ISSN 1758-5899. doi:10.1111/1758-5899.12123.
- ↑ Okbay, Aysu; Beauchamp, Jonathan P.; Fontana, Mark Alan; Lee, James J.; Pers, Tune H.; Rietveld, Cornelius A.; Turley, Patrick; Chen, Guo-Bo; Emilsson, Valur. "Genome-wide association study identifies 74 loci associated with educational attainment". Nature. 533 (7604): 539–542. PMC 4883595 . PMID 27225129. doi:10.1038/nature17671.
- ↑ Vassos, Evangelos; Forti, Marta Di; Coleman, Jonathan; Iyegbe, Conrad; Prata, Diana; Euesden, Jack; O’Reilly, Paul; Curtis, Charles; Kolliakou, Anna. "An Examination of Polygenic Score Risk Prediction in Individuals With First-Episode Psychosis". Biological Psychiatry. PMID 27765268. doi:10.1016/j.biopsych.2016.06.028.
- ↑ Hayr, M. K.; Druet, T.; Garrick, D. J. (2016-04-01). "027 Performance of genomic prediction using haplotypes in New Zealand dairy cattle.". Journal of Animal Science. 94 (supplement2): 13. ISSN 1525-3163. doi:10.2527/msasas2016-027.
- ↑ Chen, L.; Vinsky, M.; Li, C. (2015-02-01). "Accuracy of predicting genomic breeding values for carcass merit traits in Angus and Charolais beef cattle". Animal Genetics. 46 (1): 55–59. ISSN 1365-2052. doi:10.1111/age.12238.
- ↑ Pace, Jordon; Yu, Xiaoqing; Lübberstedt, Thomas (2015-09-01). "Genomic prediction of seedling root length in maize (Zea mays L.)". The Plant Journal. 83 (5): 903–912. ISSN 1365-313X. doi:10.1111/tpj.12937.
- ↑ Sallam, A. H.; Endelman, J. B.; Jannink, J.-L.; Smith, K. P. (2015-03-01). "Assessing Genomic Selection Prediction Accuracy in a Dynamic Barley Breeding Population". The Plant Genome. 8 (1). ISSN 1940-3372. doi:10.3835/plantgenome2014.05.0020.
Further reading
- Agerbo; et al. (2015). "Polygenic Risk Score, Parental Socioeconomic Status, Family History of Psychiatric Disorders, and the Risk for Schizophrenia: A Danish Population-Based Study and Meta-analysis" (PDF). doi:10.1001/jamapsychiatry.2015.0346.
- Benyamin; et al. (2014). "Childhood intelligence is heritable, highly polygenic and associated with FNBP1L". doi:10.1038/mp.2012.184.
- Breen; et al. (2016). "Translating genome-wide association findings into new therapeutics for psychiatry". doi:10.1038/nn.4411.
- Bulik-Sullivan; et al. (2015). "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies". doi:10.1038/ng.3211.
- Carey; et al. (2016). "Associations between Polygenic Risk for Psychiatric Disorders and Substance Involvement". doi:10.3389/fgene.2016.00149.
- Carneiro; et al. (2014). "Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication".
- Conley; et al. (2016). "Assortative mating and differential fertility by phenotype and genotype across the 20th century". doi:10.1073/pnas.1523592113. (appendix)
- Conley; et al. (2016). "Changing Polygenic Penetrance on Phenotypes in the 20th Century Among Adults in the US Population". doi:10.1038/srep30348.
- Davies; et al. (2011). "Genome-wide association studies establish that human intelligence is highly heritable and polygenic".
- Domingue; et al. (2015). "Polygenic Influence on Educational Attainment: New Evidence From the National Longitudinal Study of Adolescent to Adult Health".
- Dudbridge (2013). "Power and Predictive Accuracy of Polygenic Risk Scores".
- Germine; et al. (2016). "Association between polygenic risk for schizophrenia, neurocognition and social cognition across development".
- Kirkpatrick; et al. (2014). "Results of a 'GWAS Plus': General Cognitive Ability Is Substantially Heritable and Massively Polygenic".
- Krapohl; et al. (2015). "Phenome-wide analysis of genome-wide polygenic scores". doi:10.1038/mp.2015.126.
- Martin; et al. (2016). "Population genetic history and polygenic risk biases in 1000 Genomes populations" (PDF).
- Papageorge; Thom (2016). "Genes, Education, and Labor Market Outcomes: Evidence from the Health and Retirement Study" (PDF).
- Pasaniuc; Price (2016). "Dissecting the genetics of complex traits using summary association statistics" (PDF).
- Plomin; et al. (2009). "Common disorders are quantitative traits".
- Power; et al. (2015). "Polygenic risk scores for schizophrenia and bipolar disorder predict creativity" (PDF).
- Robinson; et al. (2015). "Population genetic differentiation of height and body mass index across Europe" (PDF).
- Srinivasan; et al. (2015). "Genetic Markers of Human Evolution Are Enriched in Schizophrenia".
- Stergiakouli; et al. (2016). "Association between polygenic risk scores for attention-deficit hyperactivity disorder and educational and cognitive outcomes in the general population".
- Visscher; Wray (2015). "Concepts and Misconceptions about the Polygenic Additive Model Applied to Disease" (PDF).
- Woodley; et al. (2016). "How cognitive genetic factors influence fertility outcomes: A mediational SEM analysis".
- Wray; et al. (2014). "Research review: Polygenic methods and their application to psychiatric traits" (PDF).
- Zheng; et al. (2016). "LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis" (PDF).
- Sariaslan; et al. (2016). "Schizophrenia and subsequent neighborhood deprivation: revisiting the social drift hypothesis using population, twin and molecular genetic data".
- So; Shan (2016). "Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits".