A haplotype regression approach for genetic evaluation using sequences from the 1000 bull genomes Project

Kenza Lakhssassi, Oscar González-Recio



Haplotypes from sequencing data may improve the prediction accuracy in genomic evaluations as haplotypes are in stronger linkage disequilibrium with quantitative trait loci than markers from SNP chips. This study focuses first, on the creation of haplotypes in a population sample of 450 Holstein animals, with full-sequence data from the 1000 bull genomes project; and second, on incorporating them into the whole genome prediction model. In total, 38,319,258 SNPs (and indels) from Next Generation Sequencing were included in the analysis. After filtering variants with minor allele frequency (MAF< 0.025) 13,912,326 SNPs were available for the haplotypes extraction with findhap.f90. The number of SNPs in the haploblocks was on average 924 SNP (166,552 bp). Unique haplotypes were around 97% in all chromosomes and were ignored leaving 153,428 haplotypes. Estimated haplotypes had a large contribution to the total variance of genomic estimated breeding values for kilogram of protein, Global Type Index, Somatic Cell Score and Days Open (between 32 and 99.9%). Haploblocks containing haplotypes with large effects were selected by filtering for each trait, haplotypes whose effect was larger/lower than the mean plus/minus 3 times the standard deviation (SD) and 1 SD above the mean of the haplotypes effect distribution. Results showed that filtering by 3 SD would not be enough to capture a large proportion of genetic variance, whereas filtering by 1 SD could be useful but model convergence should be considered. Additionally, sequence haplotypes were able to capture additional genetic variance to the polygenic effect for traits undergoing lower selection intensity like fertility and health traits.


full sequence; Holstein; findhap.f90; Bayesian model

Full Text:



Boichard D, Guillaume F, Baur A, Croiseau P, Rossignol MN, Boscher MY, Druet T, Genestout L, Colleau JJ, Journaux L, et al., 2012. Genomic selection in French dairy cattle. Anim Prod Sci 52: 115-120. https://doi.org/10.1071/AN11119

Calus MPL, Meuwissen THE, De Roos APW, Veerkamp RF, 2008. Accuracy of genomic selection using different methods to define haplotypes. Genetics 178: 553-561. https://doi.org/10.1534/genetics.107.080838

Calus MPL, Meuwissen THE, Windig JJ, Knol EF, Schrooten C, Vereijken ALJ, Veerkamp RF, 2009. Effects of the number of markers per haplotype and clustering of haplotypes on the accuracy of QTL mapping and prediction of genomic breeding values. Genet Sel Evol 41: 11. https://doi.org/10.1186/1297-9686-41-11

Chapman JM, Cooper JD, Todd JA, Clayton DG, 2003. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum Hered 56: 18-31. https://doi.org/10.1159/000073729

Curtis D, 2007. Comparison of artificial neural network analysis with other multimarker methods for detecting genetic association. BMC Genet 8: 49. https://doi.org/10.1186/1471-2156-8-49

Curtis D, North BV, Sham PC, 2001. Use of an artificial neural network to detect association between a disease and multiple marker genotypes. Ann Hum Genet 65: 95-107. https://doi.org/10.1046/j.1469-1809.2001.6510095.x

Cuyabano BC, Su G, Lund MS, 2014. Genomic prediction of genetic merit using LD-based haplotypes in the Nordic Holstein population. BMC Genom 15: 1171. https://doi.org/10.1186/1471-2164-15-1171

Cuyabano BC, Su G, Lund MS, 2015. Selection of haplotype variables from a high-density marker map for genomic prediction. Genet Sel Evol 47: 61. https://doi.org/10.1186/s12711-015-0143-3

Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brøndum RF, Liao X, Djari A, Rodriguez SC, Grohs C., et al., 2014. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet 46: 858-865. https://doi.org/10.1038/ng.3034

De Los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D, 2013. Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 9 (7): e1003608 https://doi.org/10.1371/journal.pgen.1003608

Druet T, Macleod IM, Hayes BJ, 2014. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions. Heredity 112: 39-47. https://doi.org/10.1038/hdy.2013.13

Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, et al., 2002. The structure of haplotype blocks in the human genome. Science 296: 2225-9. https://doi.org/10.1126/science.1069424

Gautier M, Faraut T, Moazami-Goudarzi K, Navratil V, Foglio M, Grohs C, Boland A, Garnier JG, Boichard D, Lathrop GM, Gut IG, Eggen A, 2007. Genetic and haplotypic structure in 14 European and African cattle breeds. Genetics 177: 1059-1070. https://doi.org/10.1534/genetics.107.075804

Gonzalez-Recio O, Daetwyler HD, MacLeod IM, Pryce JE, Bowman PJ, Hayes BJ, Goddard ME, 2015. Rare variants in transcript and potential regulatory regions explain a small percentage of the missing heritability of complex traits in cattle. PloS one 10: e0143945. https://doi.org/10.1371/journal.pone.0143945

Hamblin MT, Jannink JL, 2011. Factors affecting the power of haplotype markers in association studies. The Plant Genome Journal 4: 145. https://doi.org/10.3835/plantgenome2011.03.0008

Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME, 2009. Genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92: 433-443. https://doi.org/10.3168/jds.2008-1646

Hayes BJ, Bowman PJ, Daetwyler HD, Goddard ME, 2015. Why can we impute some rare sequence variants and not others? Proc Assoc Advmt Breed Genet 21: 41-44.

Hayes BJ, MacLeod IM, Daetwyler HD, Bowman PJ, Chamberlain AJ, vander Jagt CJ, Capitan A, Pausch H, Stothard P, Liao X, et al., 2014. Genomic prediction from whole genome sequence in livestock: the 1000 bull genomes project. Proc 10th World Congr Genet Appl Livest Prod. Am Soc Anim Sci, Champaign, IL, USA.

Hyten DL, Choi IY, Song Q, Shoemaker RC, Nelson RL, Costa JM, Specht JE, Cregan PB, 2007. Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics 175: 1937-1944. https://doi.org/10.1534/genetics.106.069740

Ibañez-Escriche N, Gonzalez-Recio O, 2011. Review: Promises, pitfalls and challenges of genomic selection in breeding programs. Span J Agric Res 9: 404-413. https://doi.org/10.5424/sjar/20110902-447-10

Jónás D, Ducrocq V, Fouilloux MN, Croiseau P, 2016. Alternative haplotype construction methods for genomic evaluation. J Dairy Sci 99: 1-10. https://doi.org/10.3168/jds.2015-10433

Kim ES, Kirkpatrick BW, 2009. Linkage disequilibrium in the North American Holstein population. Anim Genet 40: 279-88. https://doi.org/10.1111/j.1365-2052.2008.01831.x

MacLeod I, Hayes B, Goddard M, 2013. will sequence snp data improve the accuracy of genomic prediction in the presence of long term selection? Proc Assoc Advmt Anim Breed Genet. http://www.aaabg.org/aaabghome/AAABG20papers/macleod20215.pdf

Meuwissen T, Goddard M, 2010. Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 185: 623-631. https://doi.org/10.1534/genetics.110.116590

Qanbari S, Pimentel ECG, Tetens J, Thaller G, Lichtner P, Sharifi AR, Simianer H, 2010. The pattern of linkage disequilibrium in German Holstein cattle. Anim Genet 41: 346-356.

Schrooten C, Schopen G, Parker A, Medley A, Beatson P, 2013. Across-breed genomic evaluation based on bovine high density genotypes, and phenotypes of bulls and cows. Proc Assoc Advmt Anim Breed Genet, pp: 138-141

Van Binsbergen R, Bink MC, Calus MP, van Eeuwijk FA, Hayes BJ, Hulsegge I, Veerkamp RF, 2014. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle. Genet Sel Evol 46: 41. https://doi.org/10.1186/1297-9686-46-41

VanRaden PM, 2008. Efficient methods to compute genomic predictions. J Dairy Sci 91: 4414-23. https://doi.org/10.3168/jds.2007-0980

VanRaden PM, O'Connell JR, Wiggans GR, Weigel K, 2011. Genomic evaluations with many more genotypes. Genet Sel Evol 43: 10. https://doi.org/10.1186/1297-9686-43-10

Villa-Angulo R, Matukumalli LK, Gill CA, Choi J, Van Tassell CP, Grefenstette JJ, 2009. High-resolution haplotype block structure in the cattle genome. BMC Genet 10: 19. https://doi.org/10.1186/1471-2156-10-19

Villumsen TM, Janss L, Lund MS, 2009. The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genet 126: 3-13. https://doi.org/10.1111/j.1439-0388.2008.00747.x

Wimmer V, Lehermeier C, Albrecht T, Auinger HJ, Wang Y, Schön CC, 2013. Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195: 573-587. https://doi.org/10.1534/genetics.113.150078

Zondervan KT, Cardon LR, 2004. The complex interplay among factors that influence allelic association. Nat Rev Genet 5: 89-100. https://doi.org/10.1038/nrg1270

DOI: 10.5424/sjar/2017154-11736