Genetic diversity and allelic richness in Spanish wild and domestic pig population estimated from microsatellite markers

Genetic diversity measures support the conservation decisions aiming to maintain the genetic flexibility of animal populations and breeds. Many of these measures utilize neutral genetic markers and are based on classical concepts as coancestry and expected heterozygosity. As a component of genetic diversity, allelic richness is also important in conservation genetics. Its measurement requires that variations of sample size be taken into account using either the rarefaction and extrapolation techniques. Methods to estimate genetic diversity and allelic richness were compared in this study. DNA samples from 68 wild boars and 234 domestic pigs of the Duroc and Iberian breeds, including 63 animals of the Torbiscal and Guadyerbas Iberian lines, were genotyped for 18 microsatellites (one per autosome). As the results outline, the rank of these pig populations according to their contributions to the diversity will be different depending on the criteria utilized, because genetic diversity and private allelic richness do not exactly address the same type of diversity. Rarefaction and extrapolation-based techniques also produce partially discrepant results. The desirable integration of allelic richness into the diversity theory poses at the moment some unsolved diff iculties. Additional key words: Duroc, extrapolation, Iberian pig, microsatellites, private alleles, rarefaction, wild boar.


Introduction 1
Molecular markers can be used to explore the diversity present at the level of the individual, population, breed and species, in both wild and domestic animals (Frankham et al., 2002).Most of the analyses of genetic diversity at the breed and population level have used microsatellite markers, whose genomic locations have been defined by genetic mapping, and which are highly variable and available to define diversity across the entire genome.The application of technologies such as multi-locus microsatellite genotyping, and its potential impact on the management of genetic diversity are extremely important challenges both for geneticists and breed managers (Bruford, 2004).
Phylogenetic techniques based on genetic distances estimated from microsatellites have been the method of choice to assess the genetic diversity of livestock breeds (Barker, 1999). Thaon d'Arnoldi et al. (1998) emphasized the analysis of genetic distances by the Weitzman (1992) approach to measure the global diversity and the marginal loss of diversity attached to each breed, and based on these results, conservation priorities can be determined.But, the methods employed for calculating genetic distances to infer relationships within and among breeds are poorly suited for managing within-breed genetic diversity and for setting between breeds conservation priorities (MacHugh et al., 1998).Genetic distances were developed with the concept of species in mind, and ignore important features of livestock populations: migration, short divergence period, and small role of mutation in creating genetic differences, which are mainly assignable to selection and genetic drift (Laval et al., 2002).The high polymorphism of microsatellites makes them extremely sensitive to changes in effective population size and they can therefore accumulate major changes in allele frequencies very rapidly.Caballero and Toro (2002) consider these methods inappropriate for within-species breed conservation, because genetic variation within groups is ignored in this approach, although it may be of great importance for the management of livestock breeds.Other statistical tools, focused to the partition of genetic diversity within and between breeds, and based on the classical concepts of coancestry and expected heterozygosity, have been proposed to analyse genetic diversity in populations of livestock species (Toro et al., 2007).
As a component of genetic diversity, allelic richness is an alternative criterion to measure genetic diversity, and some authors consider that this parameter is of key relevance in conservation programmes (Petit et al., 1998;Simianer, 2005;Foulley and Ollivier, 2006a).Allelic diversity is particularly important from a longterm perspective, because the limit of selection response is mainly determined by the initial number of alleles regardless of the allelic frequencies (Hill and Rasbash, 1986) and, because it reflects better past fluctuations in population size.Observed allelic richness needs correction for the different sample size of populations using the rarefaction and/or extrapolation techniques (El Mousadik and Petit, 1996;Foulley and Ollivier, 2006a).Allelic diversity and gene diversity can behave rather differently, for example, a higher differentiation between populations can be observed for allelic diversity than for gene diversity (Foulley and Ollivier, 2006a).The objective of this study was to apply these new statistical tools for assessing the genetic and allelic diversity in a set of wild boar and domestic pig populations, and for exemplifying the difficulties of their possible application to the definition of conservation priorities.

Genotypes
We genotyped DNA samples from 68 wild boars collected in diverse regions of the North, Center and South of Spain, and from 234 domestic pigs of the Iberian and Duroc breeds.The 170 sampled Iberian pigs were inscribed in the breed herdbook, and represent the closed lines Torbiscal (31) and Guadyerbas (32), preserved from the years 1944/45 in «El Dehesón del Encinar» (Oropesa, Toledo), and 15 private or public breeding nuclei of the Iberian breed (107).The 64 sampled Duroc pigs represent seven Spanish breeding nuclei, founded with animals of diverse origins (Canada, USA, Hungary, Denmark) within this cosmopolitan breed.
All the individuals were genotyped for 18 microsatellites (one on each autosome) chosen according to their reproducibility, high polymorphism and absence of null alleles.Details of DNA extraction, PCR-reactions and microsatellite analysis are giving in Alves et al. (2006).Genomic DNA was extracted using standard protocols from blood samples (live domestic pigs) or from lymph node or ear samples (hunted wild boars).

Analysis of genetic diversity
According to Caballero and Toro (2002), in a metapopulation subdivided in n breeds or populations, the total genetic diversity or expected heterozygosity (H T ) is partitioned into a component within breeds (H S ) and another (H T − H S ) between breeds: where p i,k is the frequency of allele k in breed i.Both results are averaged across loci.The between breed component of genetic diversity (H T − H S ) is also the average Nei's minimum distance between populations (D -).Wright's (1969) f ixation index is simply the proportion of diversity between breeds relative to the total diversity: Hedrick (2005) has proposed to standardise it by the maximum level that can be obtained, F ST (max) = (1 -H S ) / (1 + H S ), given the observed heterozygosity within breeds.Thus, F' ST = F ST / F ST(max) is a measure of population differentiation relative to the maximum possible.
The contributions of each one of these populations to the whole population diversity can be also specified (see Caballero and Toro, 2002).Another way of studying the relevance of the different breeds or populations to the overall diversity is, following Petit et al. (1998), to calculate the loss or gain of diversity if one population is removed from the whole set.

Analysis of allelic diversity
Let N i being the total number of genes sampled from the breed i at one locus, i.e. twice the number of animals sampled from that breed for that locus, and N ik the number of copies found of the kth allele in this sample of size N i .The allelic richness at one locus estimated by rarefaction is denoted as the expected number of different alleles that a sample had if the sample size had been g genes instead of N i (g ≤ N i ).The estimation of allelic richness R i through rarefaction method is then, where is the probability that allele k does not occur in a sample of g genes chosen as reference, being the numerator the number of combinations of g genes that do not include allele k and the denominator the number of possible combinations of g genes taken from the sample of N i genes.An extension of the rarefaction method to count private alleles (alleles present in a given breed but absent from all the others) was defined by Kalinowski (2004) as Foulley and Ollivier (2006a) proposed an alternative method, based on extrapolation, that consists of adding to the number of alleles actually observed in a sampled population the expected number of alleles missing, given the number of genes examined in the sample and the allelic frequencies observed in the whole set of populations.If K i is the number of alleles in the sample of population i, K is the total number of alleles sampled in the whole set of populations and π k the frequency of the k allele in the whole population, the allelic richness of breed i with a sample size N i estimated by extrapolation would be: where the summation is over the subset of alleles actually missing in the sample.Therefore, the allelic richness for each population obtained with this method will be a value between the actual number of different alleles sampled in the population, K i , and the total number of alleles in the whole set of populations, K.
A partition of diversity within and between populations can also be made considering allelic diversity instead of gene diversity.Foulley and Ollivier (2006a) proposed a partition analogous to that advocated by Petit et al.
(1998) for gene diversity.Irrespective of the correction for sampling size in the allelic richness of populations, the contribution of each population i to the total allelic richness would be CT i = TR -R(S/i), where TR is the allelic richness of the whole set of populations considered as a single one, and R(S/i) is the allelic richness of this whole set excluding population i; notice that TR = K when the extrapolation method is applied.R(S/i) is estimated by considering as a single population the subset S of n-1 populations where breed i is excluded, and then calculating for S, either by rarefaction or by extrapolation, the allelic richness as explained before.Both, the extrapolation-based contribution (CTE i ) and the rarefaction-based contribution (CTR i ) may be considered as estimations of the number of private alleles in the population i, corrected for the sample size.The extrapolation-based contribution of breed i to the within-breed allelic richness would be defined as A measure of allelic richness dissimilarity (or distance) may be def ined as the expected number of alleles present in a population i and absent in population j calculated by the rarefaction technique where the summation is over all observed alleles of i and j.A similar distance can be derived from the extrapolation-based allelic richness (Foulley and Ollivier, 2006b).Both rarefaction and extrapolation-based dissimilarities are not symmetric, since in general d(i, j) ≠ d(j, i).Irrespective of the correction technique, symmetric dissimilarities may be obtained by expressing d(i, j) as the difference between allelic richness of a population (R i ) minus joint allelic richness of i and j (R ij ), and similarly for d(j, i).R ij is calculated either through rarefaction or extrapolation method when considering i and j together as a single population.Dissimilarity is obtained as

Results and Discussion
Table 1 shows the number of detected alleles for each population and genotyped microsatellite arranged by pig autosomes.This number varies from 1 to 12 (average 5.3 alleles) for the total set of populations.The number of alleles is higher in the wild boar population and Duroc and Iberian breeds.It is lower in the Guadyerbas and Torbiscal closed lines of the Iberian breed.Note that these two related lines have been preserved in a conservation programme established at the years 1944-1945, from four Spanish and Portuguese ancient Iberian strains.Torbiscal is the result of blending the four ancient founder populations (Fernández et al., 2002), and Guadyerbas is one of these founder populations (Toro et al., 2000).

Analysis of genetic diversity
The total heterozygosity of the whole set of populations is H T = 0.730, with components within breeds H S = 0.609 and between breeds D -= 0.122.The amount of differentiation is F ST = 0.167, the maximum possible being F ST(max) = 0.243, so that the standardized value is in this case F' ST = 0.687.
The contributions of each one of these populations or breeds to the whole population diversity are shown in Table 2.For example, the pigs representing the Iberian breed contribute the most (23%) to the within-population diversity component (H S ), but relatively little (14%) to the between-population component (D -).In contrast, the Guadyerbas line contributes the least to the withinpopulation variation (13%), but substantially (25%) to the averaged between-population variation.
The loss or gain of genetic diversity when one population is removed from the whole set, expressed as percentages of the recalculated genetic diversity, are presented in Table 3.For example, the removal of Iberian breed causes a decrease of 4.1% in the within-population variation, but an increase of 11.6% in the betweenpopulation diversity.In contrast, the removal of the genetically isolated Guadyerbas line involves a gain in the total (3.0%) and in the within-population variation (8.5%), but a substantial loss (24.0%) in the betweenpopulation component.Although it may seem paradoxical that the removal of one population could increase total genetic diversity, we must realize that, if owing to the removal of that population gene frequencies become more equalized, then this will increase the expected heterozygosity.The population with a higher rank for genetic diversity in this particular scenario turns out to be the Duroc breed, which removal implies a loss of both within (2.9%) and betweenpopulation (19.3%) genetic diversity.The removal of wild boar, who has not been submitted to artificial selection, implies also a loss of 2.9% of within population genetic diversity despite of being in the second position in the rank.

Analysis of allelic diversity
The respective observed values of allelic richness and private allelic richness for the different populations are shown in Table 4.The observed numbers of alleles reflect events of the history of each studied population But allelic richness also depends on sample size, and their observed values need to be corrected by rarefaction or extrapolation techniques.Note that the rarefaction method estimates the allelic richness for a uniform number of sampled genes (g), and can be applied only when there are inequalities in the sample sizes.In the present analysis, g coincides with the smallest sample size (Torbiscal line, g = 62), and the application of the rarefaction method clearly reduces the allelic richness of the other populations: the estimated R R values are always lower than the observed R O values.This general reduction of the number of alleles is associated to an increase of the mean number of private alleles per locus (PR R ) in four of the five studied populations (Table 4).
As expected, the use of the extrapolation method provides estimates clearly exceeding the observed richness: the R E values are always greater than R O values.This increase of the number of alleles is related to a diminution of the number of private alleles per locus (PR E ) in each population (Table 4).An assumption of this approach is that all the populations or breeds are drawn at random from the same founder population (Foulley and Ollivier, 2006a).Thus, if one allele is missing in a given population, the unique reason is its low sampling size, ignoring the possibility that it could be truly lost by genetic drift or other genetic causes.This can be a questionable assumption for analyzing livestock breeds.
Irrespective of the correction for sampling size, the definition of the contribution of each population i to the allelic richness [CT i = TR -R(S/i)] implies that only those populations with private alleles contribute to the allelic diversity.The results of the partition of these contributions within and between populations are presented in Table 5, showing that wild boar, Duroc and Iberian populations contribute positively to the total private allelic richness, whereas the contributions of the populations with lower number of private alleles (Torbiscal and Guadyerbas) are close to zero (CT O and CT E ) or even negative (CT R ).This strong discrepancy between the results of rarefaction and extrapolation methods may be also appreciated for the CW R and CW E Table 4. Allelic richness (R), and private allelic richness (PR) of 18 microsatellite loci in the wild boar population, Duroc and Iberian breeds and Guadyerbas and Torbiscal Iberian lines: total number of observed alleles (R T ) and private alleles (PR T ), and mean number of alleles and private alleles observed per locus (R O , PR O ), estimated by rarefaction (R R , PR R ) with uniform sample sizes of g = 62, and estimated by extrapolation (R E , PR E )  (Toro et al., 2007).The CB O , CB R and CB E values are also presented in Table 5, but these results seem to lack an intuitive justification.Finally, the allelic richness symmetric dissimilarities (1 -S) among the pig populations were calculated by rarefaction (DRR) and extrapolation (DRE) techniques.Genetic distances were also calculated on the same populations, using either the standard Reynolds distance (Reynolds et al., 1983) or the Nei's minimum distance (Nei, 1972).Table 6 shows the correlations between these measures of genetic distance and allelic richness dissimilarity, which can be summarized as follows: a) high correlation between the two genetic distances, b) correlation values between distances and dissimilarities close to 0.80 for DRE, and markedly lower for DRR, and c) medium correlation between DRR and DRE dissimilarities.These results confirm that genetic diversity and private allelic richness do not exactly address the same type of diversity.Whereas the first one is based on the expected heterozygosity, the second one provides a measure of the singularity of a population in terms of allele numbers.

Number of alleles as a measure of maximal diversity
From the above, it seems that the formal treatment of the allelic diversity is not clear, and in the case of the partition in components within and between populations it lacks an intuitive interpretation.Furthermore, only private alleles contribute to the allelic diversity of the system, and it would be interesting to have a procedure that takes into account allelic diversity even if they do not have private alleles.Finally, it would also be desirable a procedure implemented in a similar way as the classical partition of heterozygosity in population genetics.Toro et al. (2008) have proposed an alternative based in the idea that the larger is the number of alleles, the larger is the potential diversity of a breed because the maximal diversity occurs when alleles are at equal frequencies.Therefore, a way of highlighting the importance of alleles is to assume that all alleles present in a population have identical frequencies and afterwards calculate the standard partition of expected heterozygosity.The estimate of gene diversity under this situation may take into account of the allelic diversity, by considering the potentiality of each breed according to the number and type of alleles that it carries.
Applying the same calculations as above, the total heterozygosity of the whole set of breeds is H T = 0.831, with components within breeds H S = 0.746 and between breeds D -= 0.085, being the amount of differentiation F ST = 0.102.The proportional contributions of each population to the whole population diversity are shown in Table 7.The w ild boar and the Iberian breed contribute almost the same to the within-breed diversity because both populations have almost the same number of alleles.However the wild boar contributes more to the between-breed diversity because it possesses more private alleles.The overall balance indicates that the wild boar contributes more that the Iberian to the total  Of the last two populations Gudyerbas and Torbiscal, the second is the one that contributes more to the diversity because, although both have just one private allele, it has more number of alleles.
The losses or gains of diversity when one population is removed assuming identical allelic frequencies are also shown in Table 8.Note that, according to this criterion, the most relevant population is again the wild boar.However, the second one is the Duroc breed instead of the Iberian one, reflecting the fact that it contains more private alleles in this particular scenario.
All the previous paragraphs have been mainly focused to the partition of the total diversity of a metapopulation in two components: between and within populations.This partition when applied to genetic diversity or expected heterozygosity can be easily done following the standard population genetics approach.However, the allelic richness as a relevant measure of diversity is more difficult to deal with.First, because it should be standardized to a uniform sample size using either the rarefaction or extrapolation techniques, which respective results may present some discrepancy.Second, in spite of recent theoretical developments (Ollivier and Foulley, 2005) it is far from being obvious how to partition this allelic richness in between and within population components.As our results outline, the rank of the populations according to their contributions to the diversity will be different depending on the criteria utilized.Although in our case this ambiguity has an academic interest given the performed joint analysis of wild and domestic animals, in other settings such as the priorization of domestic breeds for conservation could be more worrying (Toro et al., 2007).

Table 1 .
Table 1 also gives, for each locus and population, the observed heterozygosity by direct count (H O) and the expected heterozygosity (H E ) under Hardy-Weinberg equilibrium.In wild boar, Number of alleles (N A ) and observed (Ho) and expected (He) heterozygosities of microsatellite markers in the wild boar population (n = 68), Duroc (n = 64) and Iberian (n = 107) pig breeds and Guadyerbas (n = 32) and Torbiscal (n = 31) Iberian lines

Table 2 .
Contribution of each pig population to the within-breed (H S ), between-breed (D -) and total genetic diversity (H T ).The most noticeable result is the high number of private alleles actually observed in the wild boar and Duroc populations: 24 and 16, respectively.In spite of its similar observed allelic richness and greater sampling size, the Iberian breed present only six private alleles.It may be explained by the presence among the considered populations of other two Iberian lines, which share alleles present in the genetic pool of the Iberian breed and absent in the wild boar and Duroc populations.

Table 5 .
Population contributions to total, within-population and between-population private allelic richness: observed (CT O , CW O , CB O ), estimated by rarefaction (CT R , CW R , CB R ) and estimated by extrapolation (CT E , CW E , CB E ) CT i -CW i , but the meaning of this parameter is unclear

Table 6 .
Correlations between Reynolds (D REY ) and Nei (D NEI ) genetic distances and allelic richness dissimilarities 1 calculated by rarefaction (DRR) and extrapolation (DRE) techniques among the five pig populations

Table 7 .
Contribution of each pig population to the within-breed (H The contribution of the Duroc is lower than the previous ones because although it has more private alleles than Iberian it harbours less number of alleles.

Table 8 .
Loss (-)and gain (+) in percentage (%) of genetic diversity, assuming identical allelic frequencies, when each population is removed from the whole set