Inference of hidden population substructure of the Iberian pig breed using multilocus microsatellite data

The census and structure of Iberian pig breed have experienced important changes along the last decades. Bayesian methods based on multilocus genotypes have been applied for ascertaining the actual breed structure and for identifying genetically distinctive populations. DNA samples from 170 Iberian pigs previously assigned to the strains or varieties Torbiscal, Guadyerbas, Retinto, Entrepelado and Lampiño and 64 Duroc pigs were genotyped for 36 microsatellites. A best partition of only five clusters was estimated in the clustering analysis at group level, when the previous assignation to populations was taken into account. But the individual-based assessment of population structure, ignoring the previous assignation of animals to populations, showed a more complex partition of ten clusters. Results of admixture analyses for partitioning individuals into the inferred clusters showed an important proportion of admixed individuals pre-assigned to the Retinto, Entrepelado and Lampiño varieties. The frequencies of private alleles of the MC1R gene also evidenced an important genetic flow between these varieties. The future definition of conservation units in the Iberian breed should consider these results. Additional key words: clustering, MC1R gene, mixture and admixture analysis, within-breed variation.


Introduction
Diverse local varieties (Negros Lampiños, Retinto and Rubio or Dourado) were differentiated within the ancient population of the Iberian pig breed showing important phenotypic and productive differences (Odriozola, 1976;Laguna, 1998;Benito et al., 2000).Some of these varieties were exported to American countries during the colonization, being the direct ancestors of Creole pig breeds (Lemus-Flores et al., 2001) and also contributed in the United States to the origin of the Duroc-Jersey breed (Vaughan, 1950).
The large breed census was drastically reduced since 1960 due to the outbreak of the African swine fever and Spanish Journal of Agricultural Research (2006) 4(1), 37-46 the depreciation of animal fats.Along the last years, the production of pigs of Iberian type has largely increased to satisfy the new demand of top quality meat and dry-cured products, and the population bottleneck has been reversed.However, as a consequence of the past critical period, some ancestral varieties have disappeared and other ones could be endangered or blended.
Phylogenetic techniques based on genetic distances estimated from polymorphic microsatellite markers have been the method of choice to assess the genetic diversity of livestock breeds.This approach rely on the a priori definition of populations, and its usefulness will be greatly reduced if these populations do not accurately describe the present-day biological reality (Pearse and Crandall, 2004).Genetically similar groups can be labelled differently due to distinct phenotypes, but conversely, phenotypic similarity may mask underlying genetic variation (Rosenberg et al., 2001).Martínez et al. (2000) used this classical approach for analysing the genetic structure of the Iberian breed, and their results mainly supported the division of the breed in the predefined varieties, although the traditional classification was not compatible with some singular cases.
Other methods that construct genetic clusters from a set of individual multilocus genotypes have been proposed as a more flexible alternative to those based on genetic distances.Both genetic distances and clustering methods have been used by Fabuel et al. (2004) to analyze genetic diversity and conservation priorities in Iberian pigs.The results of this study and another one based on the analysis of mitochondrial DNA sequences (Alves et al., 2003) supported new evidence for the introgression among the traditional Iberian pig varieties.
Clustering methods allow to separate a set of individuals in several populations if their genetic origin is unknown beforehand or to study the correspondence between inferred genetic clusters and known predefined population categorizations (Pritchard et al., 2000).Recently, fully Bayesian methods have been proposed for estimating hidden population substructure, which treat both the allele frequencies of the molecular markers and the number of genetically diverged populations as random variables (Corander et al., 2003).These methods allow to cluster data (genetic mixture analysis) either at group level or at individual level, and also to perform admixture analysis, in which the genome of an individual represents a mixture of alleles of different ancestries (Anderson and Thompson, 2002).The objective of this study was to apply these new statistical tools for detecting whether the actual breed structure preserves the traditional differentiated varieties or consist of mixed or admixed populations, and for defining more accurately genetic units useful to design a rational management of the Iberian breed genetic resources.

Animals
Two out of the five groups of Iberian pigs considered (Guadyerbas and Torbiscal) belong to an early conservation programme.Guadyerbas is a black hairless strain and Torbiscal is a composite strain obtained from ancient black and red varieties (Rodrigáñez et al., 2000;Toro et al., 2000).The complete genealogy of all the animals of these strains is available since 1945, with 18.9 (Guadyerbas) and 21.0 (Torbiscal) generations from the population founders until the animals genotyped here.The remaining pre-defined Iberian pig groups represent the main three extant varieties: black hairless (Negro Lampiño), red (Retinto) pigs and a black hairy variety (Entrepelado), whose piglets show a chestnut colour at birth.Genomic DNA was extracted from blood using standard protocols.Samples were collected from 170 individuals inscribed in the breed herdbook, being their distribution by strains and varieties as follows: 31 Torbiscal, 32 Guadyerbas, 50 Retinto (seven breeding nucleus), 30 Lampiño (three breeding nucleus) and 27 Entrepelado (five breeding nucleus).Due to their historical and current relations with the Iberian pigs, a total number of 64 Duroc pigs from seven breeding nucleus was also sampled and analyzed.

Microsatellites
All the animals were genotyped for 36 microsatellite markers, two on each autosome (Table 1).They were chosen according to a good reproducibility, high polymorphism and absence of null alleles.In seven out of the 18 chromosomes, both microsatellites map on the same chromosome arm with an average genetic distance of 34.5 cM, although the distances were only 13.4 and 14.2 cM for the chromosomes 1 and 18, respectively.Amplified microsatellite markers were analyzed with Genescan software on capillary electrophoresis equipment with fluorescent detection (ABI PRISM 3100 genetic analyzer).To increase the accuracy of allele size determination, four control animals were genotyped in all the runs.The genetic variability within each one of the sampled populations was measured by the number of alleles (NA) and the expected heterozygosity (H e ) for each genotyped microsatellite.

MC1R genotyping
The MC1R intragenic haplotypes (Chr.6p) present in the analyzed Iberian and Duroc populations were determined by the procedures described by Fernández et al. (2004) to obtain additional genetic information.

Clustering analysis
Mixture analysis of microsatellite data were performed according to Corander et al. (2003) to provide posterior distributions of partitions S = (s1,....., sk) among the N P sampling units into k non-empty classes (clusters), which have non-identical allele frequency parameters over the N L genotyped loci.Independently for each particular partition S, the joint distribution of the data and parameters is proportional to the Multinomial-Dirichlet expression where pijl is the unknown allele frequency, n ijl is the observed number of copies of allele l at locus j among sampling units into the cluster s i and a j is the Dirichlet prior hyperparameter, chosen as aj = 1 / N A (j) , being NA (j) the number of alleles observed at locus j.Both Hardy-Weinberg and linkage equilibrium within each class si were assumed in the above.The prior distribution for the partition parameters is chosen to be uniform in the space of all the possible partitions that are considered a priori equally likely.For small values of NP, it is possible to use complete enumeration to obtain exactly the posterior distribution of the parameters S and k over all the possible partitions.
When NP > 10, the number of possible partitions is too large for exhaustive enumeration, and values from the posterior distribution may be obtained using the Metropolis-Hastings algorithm.Additional details of the method can be found in Corander et al. (2003Corander et al. ( , 2004)), and the calculations were performed using the BAPS2.0software.In the mixture analysis at group level, the previous assignation of the pigs to six groups was used to inform the analysis and then NP = 6 is the maximum number of possible clusters.These groups are the sampling units to be clustered to study the correspondence between inferred genetic clusters and the pre-defined categorization of breeds or varieties.
After inferring the structure, the estimates of the coefficient of genetic differentiation FST (Nei, 1977) and the pairwise genetic distances D m (Nei, 1972) and DR (Reynolds et al., 1983) were calculated in a Bayesian model averaged sense, since the distance measures between genetic clusters are obtained by averaging over the posterior distribution of partition parameters (s 1 ,..., s k ).A second mixture analysis was carried out at individual level (NP = 234 individuals) to identify the optimal allocation of individuals to genetically divergent clusters, without any pre-defined categorization.
The admixture clustering analysis provides an additional parameter qi (m) , the proportion of the genome of the individual m proceeding from the cluster si, for partitioning individuals into clusters based on multilocus genotypes.Two admixture analyses were performed, based on previous inferences about clusters obtained by mixture clustering analysis either at group level and at individual level.Inferences about admixture were obtained using the software BAPS version 3.1, that calculates for each individual the posterior mode of q i (m) conditional to the structure parameters S and k, and the posterior probability ratio for the model with admixture and the model where the individual is forced to have pure ancestry.Both models are considered a priori equally likely and the posterior probality ratio coincides with the Bayes Factor (BF).As Kass and Raftery suggest, it was considered as no evidence against the no admixture hypothesis when the BF was lower than 3.2, substantial evidence when the BF was within 3.2 and 10, and strong evidence when the BF was greater than 10 (Kass and Raftery, 1995).

Results
Table 1 shows the number of detected alleles for each genotyped microsatellite arranged by autosome.This number varies from 4 to 14 (average 7.2 alleles) summed across the Iberian breed as a whole, and from 3 to 15 in the Duroc breed (average 5.8 alleles).In the Iberian breed, the average number of alleles per population ranged from 2.08 (microsatellite S0219) to 7.8 (microsatellite S0005).Table 1 also gives, for each locus and breed, the observed heterozygosity by direct count and the expected heterozygosity under Hardy-Weinberg equilibrium.The observed values were generally lower than the expected values, indicating heterogeneity between populations within each breed.

Clustering analysis at group level
The results of group level mixture analysis showed just one partition of five clusters with differing population frequencies.Four of them corresponded to the Duroc breed (s1), the Guadyerbas (s 3 ) and Torbiscal (s5) strains and the Lampiño variety (s 4 ).The last cluster (s 2 ) combines the Entrepelado and Retinto varieties of Iberian pigs.The posterior mean value of the coefficient of genetic differentiation among these five clusters FST was 0.170 and the posterior standard deviation PSD = 0.003.The means of the posterior distributions of Dm and D R distances indicate a maximum distance between Duroc and Guadyerbas, and a minimum distance between the Lampiño variety and the mixed cluster s2 grouping the Retinto and Entrepelado varieties.The values of both genetic distances are dependent on the number of generations since divergence and the effective sizes of populations.But the Dm distance also depends on the founder frequencies, and it could explain its higher values between the Duroc and Iberian clusters (Table 2).
The admixture analysis based on this clustering revealed no evidence of admixture from different clusters found for pigs pre-assigned to the Duroc breed and the Guadyerbas and Torbiscal strains.The BF values for the comparison of models with admixture and with pure ancestry were lower than 3.2 for all the individuals grouped into the clusters s1, s 3 and s 5 .However, the admixture analysis allowed to identify 38 individuals grouped into the remaining clusters (s 2 and s4) with at least substantial evidence against the no admixture hypothesis (BF > 3.2).The respective proportions of these admixed pigs were 12/27, 13/50 and 13/30, for each one of the pre-defined varieties Entrepelado, Retinto and Lampiño.Several details of these results should be outlined: i) some Entrepelado pigs showed a remarkable proportion of alleles from the clusters s3 and s 4 , corresponding to the black hairless populations Guadyerbas and Lampiño, and other Entrepelado individuals showed admixture with the clusters s1 (Duroc) and s4 (Lampiño); ii) the cluster s5 (Torbiscal) represented one important proportion of de genome of six Retinto pigs; iii) one important proportion of the genome of six pigs pre-assigned to the Lampiño variety and grouped into the cluster s4 is represented by the cluster s3 (Guadyerbas); iv) one important proportion of the genome of seven Lampiño individuals is represented by the cluster s2 (Entrepelado and Retinto), and one additional proportion of the genome of two of them is represented by the cluster s5 (Torbiscal), and finally v) seven out of the 38 admixed pigs showed a non trivial proportion (q1 ³ 0.10) of Duroc alleles.Strong evidence against the no admixture hypothesis (BF > 10) was found for 20 out of the 38 admixed pigs, which are presented in Table 3.

Clustering analysis at individual level
The results of the individual-based assessment of population structure showed a best partition of ten genetically divergent clusters.The composition of these clusters is described in the Table 4. Four results can be outlined: i) two of the inferred clusters (sE and s F ) included separately animals from the closed strains Guadyerbas and Torbiscal, corresponding to clusters s3 and s 5 obtained in the analysis at group level; ii) the cluster of greatest size (sD) was a pool of Iberian genotypes merging all the animals pre-assigned to the Entrepelado variety (27 pigs), 33 out of the 50 Retinto pigs, and 9 out of the 30 Lampiño pigs; iii) most of the Duroc pigs (47 animals) were grouped into the same cluster sA; iv) the remaining six small clusters corresponded to individuals sampled from isolated breeding nuclei of Duroc and Iberian pigs.
The results of the admixture analysis based on clustering at individual level should be cautiously examined because a minimum cluster size (30 individuals) was imposed.As a consequence the possible genetic origins of individuals were reduced to four clusters, and 17 Duroc and 38 Iberian pigs, grouped into the clusters of lower size (sB, sC, sG, sH, sI and sJ), were removed from the analysis.As in the previous analysis at group level, admixed individuals were not identified into the clusters s A (Duroc), s E (Guadyerbas) and sF (Torbiscal).According to the correspondent BF values, 17 out of the 69 pigs grouped into the cluster sD showed substantial evidence against the hypothesis of pure ancestry (BF > 3.2), being 3/27, 9/33 and 5/9 the respective proportions of admixed animals for pigs pre-assigned to the varieties Entrepelado, Retinto and Lampiño.Strong evidence against the no admixture hypothesis (BF > 10) was found for eight pigs particularly atypical of the Retinto and Lampiño varieties (Table 5).

MC1R genotypes
The complementary analysis, based on the private alleles of the coat color MC1R gene, revealed some results unexpected under the hypothesis of strict isolation between the traditional varieties of Iberian pigs (Table 6): i) the presence in some Lampiño pigs of MC1R*6 allele, jointly with the MC1R*3 allele, characteristic of black populations (Kijas et al., 1998) and ii) the presence of MC1R*3 alleles in some Retinto pigs, jointly with the MC1R*6 or MC1R*7 alleles, characteristic of red populations (Fernández et al., 2004).However, the joint segregation of MC1R*3, MC1R*6 or MC1R*7 alleles in Torbiscal pigs could be expected according to their genetic origin.The presence of the black MC1R*3 allele in the red Retinto, Torbiscal and Entrepelado populations refutes the assumed dominance of this allele.

Discussion
The Iberian breed had its origin long before the period of development of European breeds from the end of the 1700s to the beginning of the 1900s, mainly based on racial standards and herdbooks controlled by breed societies.For centuries, this pig population was extensively farmed in the sparse woodlands of the Southwest of Iberian peninsula to satisfy the high demand for animal fats.Without selective preponderance of any group of breeders and scarce genetic flow between herds, its genetic singularity was developed through a process of adaptation to hard environmental conditions derived from seasonal availability of feeding resources and semiarid continental climate.Besides of the empirical selective breeding, demographic fluctuations and population isolation have been other moulding influences on the within breed differentiation of locally diffused varieties, with rare genetic flow between herds.This heterogeneity was acknowledged in the breed standard type, which was lately proposed during the past century.As a consequence of the census reduction, the old reticular Table 5. Results of admixture analysis conditional on clusters inferred by mixture clustering analysis at individual level1 : Bayesian posterior mode estimates of the proportion of the genome [qi (m) ] that belongs to the cluster si in pigs with strong evidence against the hypothesis of no admixture (Bayes Factor2 > 10)
According to mtDNA studies at least the maternal contribution of Asian pigs to the Iberian breed seems unlikey, although East Asian pigs have contributed to the development of most of the European breeds (Alves et al., 2003).The first goals of this paper were to analyse the actual genetic structure of the Iberian breed and the prevalence of the old varieties potentially submitted to the damaging effects of the genetic erosion.In this sense, the clustering analysis at group level revealed a partition of only four clusters for the Iberian breed, with the extant Entrepelado and Retinto varieties grouped in the same genetic cluster.But the range of plausible values for the number of clusters (k) has the number of sampled populations (NP) as upper boundary, and therefore this analysis cannot detect underlying substructure if it occurs within each pre-defined population (Manel et al., 2005).The clustering analysis at individual level overcome this restriction, and its results concerning the Iberian pigs delineated a more detailed partition of seven genetic clusters.The cluster of greater size grouped 69 pigs pre-assigned to the Entrepelado, Retinto and Lampiño varieties, suggesting shared ancestral origins for these pigs, regardless of their phenotypical dissimilarities.Finally, the results of admixture analysis showed a noticeable number of pigs pre-assigned to these three varieties with important proportions of their genome proceeding from recent admixture events.The most clear result of the analyses was the large blending detected among the ancient varieties.The Entrepelado variety is an emergent type of Iberian pigs, which origin is an intriguing topic.Martínez et al. (2000) reported dendrograms based on genetic distances between Iberian pig varieties, in which Retinto and Entrepelado samples appear mixed in the same cluster and the Lampiño samples were clustered into a different group.However, Diéguez (2001) hypothesized that the Entrepelado variety should proceed from the intercross among Iberian Retinto and Lampiño pigs.This genetic origin could explain the joint segregation of MC1R*3, MC1R*6 or MC1R*7 alleles in Entrepelado pigs and the present clustering results.Moreover, Alves et al. (2003) found two mtDNA haplotypes, based on Cyt B and D-loop sequences, with simultaneous presence in Entrepelado and either Lampiño or Retinto pigs suggesting maternal ancestries from both traditional varieties.
All methods based on cluster analysis involve some uncertainity unless the true populations are strongly divergent, as the Duroc pigs grouped in the clusters s1 or s A , and the preserved Guadyerbas and Torbiscal strains grouped into unique clusters.Introgression of Duroc genes is the most important risk of genetic pollution for the Iberian breed, and breed specific markers based on polymorphisms found in two coat color genes (MC1R and OCA2) have been proposed to detect Duroc crossbred individuals (Fernández et al., 2004).Although none Duroc MC1R*4 allele was detected in the analyzed Iberian samples, the results of admixture analysis showed substantial or strong evidence of Duroc alleles in pigs from two breeding nuclei of the Entrepelado and Retinto varieties.The magnitude of this proportion (close to 0.125) could indicate the presence of one Duroc pig among the eight great-grandfathers of each one of these admixed animals.A more detailed investigation of possible recent crossbreeding in these two breeding nuclei may be advisable.
In the admixture analysis, the admixture proportions in the parental and hybrid populations was assumed to be measurable, but subsequent selective breeding and genetic drift from the admixture event until the sampled individuals may change the allele frequencies difficulting to get a precise inference of admixture proportions (Bruford, 2004).The results concerning the related strains Guadyerbas and Torbiscal illustrate this topic.According to the pedigree analysis, one third of the genome of the actual Torbiscal pigs proceed from the contribution of Guadyerbas ancestors to the foundation of this composite strain, kept isolated since 1963 (Rodrigáñez et al., 2000).But 14 generations later, the admixture event become obscured by genetic drift and no evidence of admixture was observed in the present analyses of Torbiscal pigs.
The six small clusters grouping Duroc, Lampiño or Retinto pigs from isolated breeding nuclei require a more detailed description.The clusters sB and s C group descendants from ancient Duroc-Jersey pigs imported from United States forty years ago, and are respectively maintained by the Centro Regional de Selección y Reproducción Animal (CERSYRA) of Badajoz and one private breeder.More interesting are the clusters sG and sH, that join separately animals of two breeding nuclei of the Lampiño variety, one of them corresponding to the ancient variety Lampiño de la Serena and the other one from Portuguese provenance.Finally the clusters sI and sJ group separately pigs of two breeding nuclei of the Retinto variety, both showing singular morphological traits.One of them is characterized by pigs with narrow legs and steep angle pastern, and the other one presents a high frequency of animals showing wattles in their neck («mamellados»).
Diverse prioritisation approaches for livestock breed conservation are today largely debated by the specialists, and their application to the Iberian breed have been discussed by Fabuel et al. (2004).A rational management of the Iberian breed genetic resources should take account of these singular breeding nuclei with significant allele frequency differences at both nuclear and mitochondrial DNA, detected by these analyses and the previous one of Alves et al. (2003).Besides of their genetic uniqueness, they are vestiges of the old Iberian varieties.Both characteristics may justify their consideration as candidates to be preserved in well designed conservation programmes.

Table 1 .
Number of alleles (NA) and observed (Ho) and expected (He) heterozygosities of microsatellite markers in Iberian and Duroc breeds.For each marker the chromosome (Chr) localization is indicated

Table 2 .
Results of mixture analysis at group level: posterior means and standard deviations of genetic distances among the five inferred clusters 1, 2 .Nei's distance Dm, above the diagonal and Reynolds's distance DR, below the diagonal

Table 4 .
Results of mixture analysis at individual level: optimal partition of individual pigs S (sA,...,sJ) and individuals grouped into each cluster

Table 6 .
Frequencies of MC1R alleles in the six Iberian and Duroc pig populations studied.