Genomic differentiation among varieties of Iberian pig

Aim of study: The objective of this study was to identify the autosomal genomic regions associated with genetic differentiation between three commercial strains of Iberian pig. Area of study: Extremadura (Spain). Material and methods: We used the Porcine v2 BeadChip to genotype 349 individuals from three varieties of Iberian pig (EE, Entrepelado; RR, Retinto; and TT, Torbiscal) and their crosses. After standard filtering of the Single Nucleotide Polymorphism (SNP) markers, 47, 67, and 123 haplotypic phases from EE, RR, and TT origins were identified. The allelic frequencies of 31,180 SNP markers were used to calculate the fixation index (FST) that were averaged in sliding windows of 2Mb. Main results: The results confirmed the greater genetic closeness of the EE and RR varieties, and we were able to identify several genomic regions with a divergence greater than expected. The genes present in those genomic regions were used to perform an Overrepresentation Enrichment Analysis (ORA) for the Gene Ontology (GO) terms for biological process. The ORA indicated that several groups of biological processes were overrepresented: a large group involving morphogenesis and development, and others associated with neurogenesis, cellular responses, or metabolic processes. These results were reinforced by the presence of some genes within the genomic regions that had the highest genomic differentiation. Research highlights: The genomic differentiation among varieties of the Iberian pig is heterogeneous along the genome. The genomic regions with the highest differentiation contain an overrepresentation of genes related with morphogenesis and development, neurogenesis, cellular responses and metabolic processes. Additional key words: Sus scrofa; single nucleotide polymorphism; founder haplotypes; candidate genes; gene ontology. Abbreviations used: EE (Entrepelado); FST (Fixation Index); GO (Gene Ontology); MDS (Multidimensional Scaling Analysis); ORA (Overrepresentation Enrichment Analysis); QTL (Quantitative Trait Loci); RR (Retinto) SNP (Single Nucleotide Polymorphism); TT (Torbiscal). Authors’ contributions: Conceived, designed and performed the experiments: NIE, JLN, JC, MMHV, MJGS and LV. Analyzed the data: IA and LV. Wrote the paper: LV. All authors read and approved the final manuscript. Citation: Alonso, I; Ibáñez-Escriche, N; Noguera, JL; Casellas, J; Martín de Hijas-Villalba, M; Gracia-Santana, MJ; Varona, L (2020). Genomic differentiation among varieties of Iberian pig. Spanish Journal of Agricultural Research, Volume 18, Issue 1, e0401. https://doi.org/10.5424/sjar/2020181-15411 Supplementary material (Tables S1, S2 and S3) accompanies the paper on SJAR’s website Received: 05 Jul 2019. Accepted: 26 Feb 2020. Copyright © 2020 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC-by 4.0) License. Funding Agencies/Institutions Project/Grant Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Spain RTA2012-00054-C02-01 Ministry of Science, Innovation and Universities, Spain CGL2016-80155-R; IDI-20170304 (CDTI) Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Luis Varona: lvarona@unizar.es Inés Alonso, Noelia Ibáñez-Escriche, José L. Noguera, J., et al. Spanish Journal of Agricultural Research March 2020 • Volume 18 • Issue 1 • e0401 2 in the analysis, which covered an autosomal genome of 22.61 Mb that had a density of one SNP marker per 7251.38 bp. In first place, we tried to corroborate the divergence between varieties by using a multidimensional scaling analysis (MDS) with the cmdscale () command from R package stats (R Core Team, 2019), and a maximum likelihood estimation of individual ancestries using the ADMIXTURE software (Alexander et al., 2009) with the assumption of three populations. In a second step, we performed an imputation process of missing alleles and the reconstruction of the haplotype phases for the genotyped individuals and their parents with the FImpute software (Sargolzaei et al., 2014). First, a haplotype library built from reference individuals (24 sires and 30 dams with more than 4 progeny) was generated using the save_hap_lib option of the FImpute software. Second, we used this haplotype library to reconstruct the founder haplotypic phases of each Iberian pig variety using ad-hoc software written in FORTRAN90 and we were able to identify 47, 67 and 123 different haplotypes for EE, RR and TT, respectively. Third, these haplotypes were used to calculate allelic frequencies for each population and SNP by counting the alleles and dividing by the number of haplotypes. These allelic frequencies were used to calculate the Weir-Cockerham (Weir & Cockerham, 1984) estimator of the fixation index (FSTi) (Wright, 1951) for each (ith) SNP among the three populations and for each specific pair of populations, Entrepelado-Retinto (FST(ER)i), Entrepleado-Torbiscal (FST(ET)i ) and Retinto-Torbiscal (FST(RT)i). Finally, the single SNP FST statistics were averaged in sliding windows of 1, 2 and 3 Mb centered at each SNP. The genes located within the genomic regions associated with an average FST greater than the 95th and 99.9th percentiles were identified using the Biomart Tool (Smedley et al., 2015) with the Sus scrofa 11.1 genome map. The genes within the genomic regions that had an average FST greater than the 95th percentile were used in an overrepresentation enrichment analysis (ORA) with the gene ontology (GO) terms for biological process for Homo sapiens and Sus scrofa with the WEB-based Gene SeT AnaLysis Toolkit (Wang et al., 2017; www.webgestalt.org) and the complete genome as a reference set. Results and discussion Genomic divergence between populations The results of the MDS and the estimation of individual ancestries with maximum likelihood are presented in Figures 1 and 2, respectively. Both approachIntroduction The Iberian pig is a native breed from the Iberian Peninsula which has high adipogenic capability and meat of excellent quality (Ventanas et al., 2005). It is the largest extant population of the Mediterranean-type pig and, traditionally, its geographical distribution has been limited to the southwest of the Iberian Peninsula. The population structure of the Iberian pig comprises several varieties that have diverged because of genetic drift, selection, and adaptation. Some authors have reported that genetic variability among Iberian varieties is as high as it is among commercial breeds of white pig (Laval et al., 2000; Martínez et al., 2000; Fabuel et al., 2004). Genomic diversity between Iberian pig varieties is not expected to be homogeneous throughout the genome. In fact, it is plausible that some genomic regions exhibit higher divergence because of selection or adaptation processes (Qanbari & Simianer, 2014). As far as we know, few studies (Herrero-Medrano et al., 2013; Silió et al., 2016) have studied the genomic diversity of the Iberian population using high-density genotypes, and none of them has analyzed the genomic differentiation between the Iberian varieties throughout of the genome. Therefore, the objective of this study was to identify the genomic regions that had the highest degree of differentiation among three of the most widely used Iberian pig varieties (EE, Entrepelado; RR, Retinto; and TT, Torbiscal). In addition, the study identified candidate genes and the biological processes associated with the differentiation between varieties. Material and methods Genotypes from 349 individuals with the PorcineSNP60 v2 BeadChip (Illumina Inc., San Diego, USA) were used. The data set included purebred Entrepelado (EE, n=21 individuals), Retinto (RR, n=50) and Torbiscal (TT, n=78). In addition, there were individuals from all of the reciprocal crosses: Entrepelado × Retinto (ER) and Retinto × Entrepelado (RE, n=25), Entrepelado × Torbiscal (ET) and Torbiscal × Entrepelado (TE, n=37), and Retinto × Torbiscal (RT) and Torbiscal × Retinto (TR, n=138). All analyzed individuals were descendants from 44 sires (12 EE, 14 RR and 18 TT) and 139 dams (24 EE, 42 RR and 73 TT) with an average progeny (± standard deviation) of 7.93 ± 8.01 and 2.51 ± 8.01, respectively. Genotypes were filtered using PLINK (Purcell et al., 2007). The criteria of selection included a call rate higher than 95% at the individual and single nucleotide polymorphism (SNP) levels and a minor allelic frequency (MAF) > 0.01 for autosomal SNPs. Subject to those criteria, 31,180 of 61,565 SNP’s were included

2 in the analysis, which covered an autosomal genome of 22.61 Mb that had a density of one SNP marker per 7251.38 bp. In first place, we tried to corroborate the divergence between varieties by using a multidimensional scaling analysis (MDS) with the cmdscale () command from R package stats (R Core Team, 2019), and a maximum likelihood estimation of individual ancestries using the ADMIXTURE software (Alexander et al., 2009) with the assumption of three populations.
In a second step, we performed an imputation process of missing alleles and the reconstruction of the haplotype phases for the genotyped individuals and their parents with the FImpute software (Sargolzaei et al., 2014). First, a haplotype library built from reference individuals (24 sires and 30 dams with more than 4 progeny) was generated using the save_hap_lib option of the FImpute software. Second, we used this haplotype library to reconstruct the founder haplotypic phases of each Iberian pig variety using ad-hoc software written in FORTRAN90 and we were able to identify 47, 67 and 123 different haplotypes for EE, RR and TT, respectively. Third, these haplotypes were used to calculate allelic frequencies for each population and SNP by counting the alleles and dividing by the number of haplotypes. These allelic frequencies were used to calculate the Weir-Cockerham (Weir & Cockerham, 1984) estimator of the fixation index (F STi ) (Wright, 1951) for each (ith) SNP among the three populations and for each specific pair of populations, Entrepelado-Retinto (F ST (ER) i ), Entrepleado-Torbiscal (F ST (ET) i ) and Retinto-Torbiscal (F ST (RT) i ). Finally, the single SNP F ST statistics were averaged in sliding windows of 1, 2 and 3 Mb centered at each SNP.
The genes located within the genomic regions associated with an average F ST greater than the 95 th and 99.9 th percentiles were identified using the Biomart Tool (Smedley et al., 2015) with the Sus scrofa 11.1 genome map. The genes within the genomic regions that had an average F ST greater than the 95 th percentile were used in an overrepresentation enrichment analysis (ORA) with the gene ontology (GO) terms for biological process for Homo sapiens and Sus scrofa with the WEB-based Gene SeT AnaLysis Toolkit (Wang et al., 2017;www.webgestalt.org) and the complete genome as a reference set.

Introduction
The Iberian pig is a native breed from the Iberian Peninsula which has high adipogenic capability and meat of excellent quality (Ventanas et al., 2005). It is the largest extant population of the Mediterranean-type pig and, traditionally, its geographical distribution has been limited to the southwest of the Iberian Peninsula. The population structure of the Iberian pig comprises several varieties that have diverged because of genetic drift, selection, and adaptation. Some authors have reported that genetic variability among Iberian varieties is as high as it is among commercial breeds of white pig (Laval et al., 2000;Martínez et al., 2000;Fabuel et al., 2004).
Genomic diversity between Iberian pig varieties is not expected to be homogeneous throughout the genome. In fact, it is plausible that some genomic regions exhibit higher divergence because of selection or adaptation processes (Qanbari & Simianer, 2014). As far as we know, few studies (Herrero-Medrano et al., 2013;Silió et al., 2016) have studied the genomic diversity of the Iberian population using high-density genotypes, and none of them has analyzed the genomic differentiation between the Iberian varieties throughout of the genome. Therefore, the objective of this study was to identify the genomic regions that had the highest degree of differentiation among three of the most widely used Iberian pig varieties (EE, Entrepelado; RR, Retinto; and TT, Torbiscal). In addition, the study identified candidate genes and the biological processes associated with the differentiation between varieties.

Material and methods
Genotypes from 349 individuals with the Porcine-SNP60 v2 BeadChip (Illumina Inc., San Diego, USA) were used. The data set included purebred Entrepelado (EE, n=21 individuals), Retinto (RR, n=50) and Torbiscal (TT, n=78). In addition, there were individuals from all of the reciprocal crosses: Entrepelado × Retinto (ER) and Retinto × Entrepelado (RE, n=25), Entrepelado × Torbiscal (ET) and Torbiscal × Entrepelado (TE, n=37), and Retinto × Torbiscal (RT) and Torbiscal × Retinto (TR, n=138). All analyzed individuals were descendants from 44 sires (12 EE, 14 RR and 18 TT) and 139 dams (24 EE, 42 RR and 73 TT) with an average progeny (± standard deviation) of 7.93 ± 8.01 and 2.51 ± 8.01, respectively. Genotypes were filtered using PLINK (Purcell et al., 2007). The criteria of selection included a call rate higher than 95% at the individual and single nucleotide polymorphism (SNP) levels and a minor allelic frequency (MAF) > 0.01 for autosomal SNPs. Subject to those criteria, 31,180 of 61,565 SNP's were included 3 Genomic differentiation among varieties of Iberian pig between the purebred populations that generated the cross. In addition, the estimation of individual ancestries after the assumption of three populations distributed the purebred individuals into three different ancestries and assigned crossbred individuals with approximately half ancestry from each parental population.
Average ± standard deviation of the F ST results between the three varieties of Iberian pig was 0.069 ± 0.060, which is lower than the estimate reported by Fabuel et al. (2004), who used a set of 36 microsatellites (F ST =0.129). However, they are not directly comparable since microsatellites have a higher mutation rate and, therefore, F ST estimates are not in the same scale. In addition, Faubel et al. (2004) included five varieties (Entrepelado, Retinto, Lampiño, Torbiscal, and Guadyerbas), and the Guadyerbas population had the greatest genetic distance from the other populations. Therefore, a greater estimate of F ST was expected in their analysis. We did not include Guadyerbas because it is almost entirely restricted to conservation programs. The mean ± standard deviation of F ST statistics between pairs of populations was 0.045 ± 0.055 between EE and RR, 0.049 ± 0.059 between EE and TT, and 0.057 ± 0.072 between RR and TT. The results of paired t-tests were highly significant (p<1e-8) for all comparisons. The results confirmed the closeness between Entrepelado and Retinto, as it was previously reported by Fabuel et al. (2004).
The results of the genomic scans through the porcine autosomal genome of the single SNP F ST statistic for all populations and after averaging them in sliding windows of 1, 2, and 3 Mb centered at each SNP are presented in Figure 3. The distribution along the genome of F ST statistics calculated with a single SNP was extremely noisy. Therefore, it was not possible to extract a clear pattern of the genomic differentiation between populations and confirmed the need to average estimates of the F ST in wider genomic regions. The number of markers included in each window was 18.37 ± 6.63, 33.77 ± 11.36, and 48.97 ± 15.68 SNP for sliding windows of 1, 2, and 3 Mb, respectively. The results from the genomic scan with sliding windows of 1 Mb were somewhat noisy, and the results based on sliding windows of 2 Mb and 3 Mb were very similar. Therefore, to achieve a compromise between noise reduction and the genomic size of the windows, we decided to explore the results based on 2 Mb sliding windows.
The genomic scans for each pair of populations (EE and RR, EE and TT and RR and TT) with sliding windows of 2 Mb are presented in Figure 4. The distributions of the F ST estimates along the autosomal chromosomes were es indicate that the genomic data clearly identified the genetic structure of the populations. The representation of the MDS with the first two dimensions distributed the individuals in six groups that correspond to the pure (EE, RR and TT) and crossbred (ER, ET and RT) populations. Graphically, the crossbred were located   Those results are reinforced by the genes within the genomic regions greater than the 99.9 th percentile (0.157) which were located at SSC8 (56983392-60232132 bp), SSC15 (80515676-86990138 bp) and SSC17 (42130987-42480736 bp) as presented in the Table S1 [suppl.]. Among them is a family of HOXD (Homeobox protein) genes that encode a family of transcription factors that play a crucial role in morphogenesis (Myers, 2008), jointly, with the tightly linked EVX2 (Even-Skipped Homeobox 2) (Hérault et al., 1997), and the SP9 (Sp9 Transcription Factor) (Kawakami et al., 2004). In addition, the NEUROD1 (Neurogenic differentiation 1) is a transcription factor involved in regulatory networks in embryonic stem cells (Marchand et al., 2009), the OLA1 (Obg Like ATPase 1) gene plays a role on the attachment of cells to the extracellular matrix (Jeyabal et al., 2014), and the ATF2 (Activating transcription factor-2) which has been found to affect skeletal growth (Vale-Cruz et al., 2008). Some other interesting genes located within them that are related with morphogenesis are the CHN1 (Chimerin 1) and the PRKRA (protein kinase, interferon inducible double stranded RNA dependent activator). The CHN1 is mostly expressed in the brain and it is associated with the early development of the nervous system (Lim et al., 1992) and PRKRA has been related with the development of the cerebellum (Yong et al., 2015). An additional evidence of the relationship of those genomic regions with the embryological development is that, in pigs, they have been associated with QTL for teat numbers in SSC8 (Velardo et al., 2016), number of mummies clearly different among the three genomic scans. However, we were able to detect some degree of similarity since the correlations between the F ST estimates obtained from EE and RR and EE and TT, EE and RR and RR and TT and EE and TT and RR and TT were 0.218 (p<0.001), 0.214 (p<0.001), 0.317 (p<0.001), respectively.

Biological processes and candidate genes between Entrepelado and Retinto
The genomic regions that had an average F ST( ER) greater than the 95 th percentile (0.082) contained 651 genes, which were used in an ORA for the GO for biological process. Among them, 569 and 157 genes were annotated to functional categories in the Homo sapiens and Sus scrofa databases, respectively. The enriched GO terms that had a p-value < 0.0001 are presented in Table 1. We identified up to 15 and 3 terms within the human and the porcine databases, respectively. All of the GO terms identified with the Homo sapiens reference (anterior/posterior pattern specification, skeletal system development, limb morphogenesis, appendage morphogenesis, pattern specification process, regionalization, embryonic skeletal system development, appendage development, limb development, skeletal system morphogenesis, embryo development, embryonic limb morphogenesis and embryonic appendage morphogenesis) and two identified with the Sus scrofa (skeletal system development and anatomical structure morphogenesis) were associated with embryogenesis and morphogenesis. In addition, a biological process 3.87e-05 skeletal system development (Sus scrofa) 5.09e-05 anatomical structure morphogenesis (Sus scrofa) 6.38e-05 regulation of gene expression (Sus scrofa) 6.54e-05 6 (Ponsuksili et al., 2015) and hemoglobin content  in SSC2, intramuscular fat content (Cepica et al., 2012) and vertebra number (Rohrer et al., 2015) in SSC6 and the ratio to non-productive days in SSC14 (Onteru et al., 2011). In addition, some of the genes within or near those genomic regions (see Table S2 [suppl.]) are associated with the same biological processes highlighted above, and are good candidates for had being affected by selection or adaptation; e.g., INSR (insulin receptor) or ADCYAP1 (Adenylate Cyclase Activating Polypeptide 1). The INSR gene has been proposed as a candidate gene for intramuscular fat content by Cepica et al. (2012) while ADCYAP1 is a member of the glucagon superfamily of hormones that are involved in in growth, metabolism, and immune response (Sherwood et al., 2000). In addition, NWD1 (NACHT and WD Repeat Domain Containing 1) modulates androgen receptor signaling (Correa et al., 2014), and MN1 (Transcriptional activator MN1) is involved in the development of craniofacial traits (Pallares et al., 2015). Further, another interesting gene is the CD209 (dendritic cell-specific intercellular adhesion molecule-3-grabbing non-integrin, DC-SIGN), that functions as an important pattern recognition receptor (PRR) in immune defense and plays a role in the immune modulation during pathogen infection (Soilleux et al., 2002).

Biological processes and candidate genes between Retinto and Torbiscal
The genomic regions that had an average F ST (RT) within the 95 th percentile (0.101) in sliding windows of 2 Mb contained 596 genes, of which 530 and 146 were annotated in Homo sapiens and Sus scrofa databases, respectively. The results of the ORA analyses identified statistically significant (p<0.0001) GO terms when crossed with the Homo sapiens database, only (Table 3). There were up to four GO terms associated with development of the neuronal system (generation (Onteru et al., 2012) and stillbirths (Schneider 5, 2015) in SSC15.
Moreover, some of those genes such as the HOXD family are also associated with the regulation of gene expression, but it is noteworthy that the ACRT5 (ARP5 Actin Related Protein 5 Homolog) is involved in the INO80 complex that contributes to transcription, DNA repair, and DNA replication (Conaway & Conaway, 2009). Therefore, they can be related with the biological process last identified (regulation of gene expression) in the Sus scrofa database.

Biological processes and candidate genes between Entrepelado and Retinto
The genomic regions that were within the 95 th percentile (0.092) of the average F ST (ET) in sliding windows of 2 Mb contained 886 genes, of which 784 and 228 were annotated in the Homo Sapiens and Sus scrofa databases, respectively. The results of the ORA with these genes are presented in Table 2. One of the enriched GO terms for biological processes was also associated with morphogenesis (animal organ morphogenesis) and the other terms were associated with the global metabolism of the individuals through metabolic or catabolic processes (collagen catabolic process, multicellular organismal catabolic process, and multicellular organism metabolic process), regulation of hormone secretion (negative regulation of hormone secretion), or the ubiquitination of proteins (positive regulation of ubiquitin-protein transferase activity). Furthermore, six genomic regions were detected that had an average F ST (ET) greater than the 99.9 th percentile (0.178) and were at SSC2 (60177754-61303696, 71588075-71588075 and 92958395-93175114 bp), SSC6 (104376948-105459825 bp), SSC10 (36887963-37186755 bp), and SSC14 (45509383-45509383 bp). A search of the porcine QTL database (https://www. animalgenome.org/QTLdb/pig/) indicated that these regions have been associated with copying behavior

Final remarks
The main conclusion of this study is that the processes of differentiation among Iberian pig varieties have heterogeneously affected the autosomal genome. A first approximation has identified potential candidate genes, most of which are associated with morphogenesis, neuronal development, regulation of metabolism, or cellular response to stressors. The presence of those candidate genes is coherent with the recent evolution of the Iberian pig populations, they have evolved through adaptation to harsh environmental conditions. Furthermore, producers have subjected the Iberian pig to "empirical" selection in which adipogenic capacity and morphological traits have played an important role. Nevertheless, further research must be done to confirm these results.
of neurons, neurogenesis, neuron differentiation and neuron projection development), and another term was associated with the general development of the organism (regulation of multicellular organismal development). Furthermore, two GO terms were associated with the responses of the organism to stress (regulation of response to stress) and lipopolysaccharides (cellular response to lipopolysaccharide), and another was associated with microtubule activity (regulation of microtubule motor activity). Some of the genes within the genomic regions that had an average F ST that was in the 99.9 th percentile (0.189) confirmed these results (see Table S3 [suppl.]). They were located at SSC1 (77056462-77472299, 90748703-90896710, and 141989498-142042139 bp), SSC6 (104376948-105459825 bp), SSC8 (89447995-89679243 bp), and SSC12 (28731262-28831639 bp). The genomic regions of SSC6 was also identified within the most divergent genomic regions between EE and TT, which strengthens the argument that, among other, INSR and AD-CYAP1 genes may be good candidates to have been affected by selection or adaptation processes that have influenced the genetic configuration of the TT population. Moreover, other genes of note include FYN (FYN Proto-Oncogene, Src Family Tyrosine Kinase), which is involved in the early stages of neurogenesis (Yagi et al., 1994), CDK19 (Cyclin Dependent Kinase 19), a regulator of the p53 network for cellular response to stress (Audetat et al., 2017), and TRAF3IP2 (TRAF3 interacting protein), which plays a central role in innate immunity in response to pathogens (Wu et al., 2013). In addition, COL12A1 (Collagen, type XII, alpha-1) is involved in bone formation (Izu et al., 2011) and CA10 (Carbonic anhydrase-related protein CA10) has been identified as an important neurexin ligand (Sterky et al., 2017). Finally, it must be remarked that the genomic regions of SSC1 (77056462-77472299) were associated by Fontanesi et al. (2017) with a large QTL for intramuscular fat and that the genomic regions of SSC6 and SSC12 has been related with the genetic variation in vertebra numbers (Rohrer et al., 2015). Table 3. Enriched gene ontology (GO) terms for biological processes (p < 0.0001) with genes located within the genomic regions that have an average F ST over the 95 th percentile between Retinto and Torbiscal Iberian pig populations based on the Homo sapiens and Sus scrofa databases.