Identification of gene pools used in restoration and conservation by chloroplast microsatellite markers in Iberian pine species

Aim of study: To contribute to the characterization of the origin of material used in afforestation, restoration or conservation activities by using Cp-SSR markers. Area of study: We used information from the natural range of Iberian pines, from Spain. Material and methods: We used Iberian pines as an example to undertook gene pool characterization based on a wide Iberian sample of 97 populations from five Pinus species (Pinus halepensis, Pinus pinaster, Pinus nigra, Pinus sylvestris and Pinus uncinata). Haplotypes from each analyzed tree (derived from nine chloroplast microsatellites markers in P. halepensis and six in the rest of the species) were obtained. Based on this information we subdivided each species in regions (considering both genetic structure and its application in afforestation, restoration and conservation programs) and tested the assignation of populations to the different groups based on the genetic distance among samples. Main results: The rate of successful identification of populations among the different species was very high (> 94 %) for P. nigra, P. sylvestris and P. uncinata, high (81 %) for P. pinaster, and low (< 65 %) for P. halepensis. Research highlights: Chloroplast DNA markers from extensive population datasets can be used to assign the origin of the forest reproductive material in some pine species. Additional keywords: genetic distance; region of provenance; fingerprinting. Abbreviations used: cpSSR (chloroplast microsatellite); FCT (variance components of the population permutated among groups); FSC (variance components of genotypes permutated among populations within groups); FST (variance components of genotypes permutated among populations and among groups); k (number of groups). Authors ́ contributions: Conceived, designed and performed the study: RA, EHT, JH. Analyzed the data: EHT, ZL, RA. Contributed analysis tools: MN. Wrote the paper: EHT, JH, ZL, MN, RA. Coordinated the research project: RA, JH. Citation: Hernández-Tecles, E.; de las Heras, J.; Lorenzo, Z.; Navascués, M.; Alía, R. (2017). Identification of gene pools used in restoration and conservation by chloroplast microsatellite markers in Iberian pine species. Forest Systems, Volume 26, Issue 2, e05S. https://doi.org/10.5424/fs/2017262-9030 Received: 23 Nov 2016. Accepted: 15 Sep 2017. Copyright © 2017 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution (CC-by) Spain 3.0 License. Funding: Spanish Ministry of Competitivity and Innovation (RTA2013-00048-C03-01; PCIN-2014-138); Spanish Ministry of Agriculture and Forestry (AEG06-02). Competing interests: authors have declared that no competing interests exist. Correspondence should be addressed to Ricardo Alía: alia@inia.es 26 (2), e05S, 10 pages (2017) eISSN: 2171-9845 https://doi.org/10.5424/fs/2017262-9030


Introduction
When managing degraded ecosystems the objective is to restore them to normal functioning (Holmes & Richardson 1999), and therefore restoration and conservation activities must be considered complementary.In afforestation and restoration activities, one of the main concerns is the use of a suitable species pool (Jones, 2003), i.e. the set of species that can potentially inhabit a site with the local ecological conditions (Zobel et al., 1998;Pärtel et al., 2011;García del Barrio et al., 2013).A special emphasis is being paid to plant species diversification, which involves a better use of the large pool of native species available when using an ecosystem-oriented approach (Bautista et al., 2009).In most cases, it is local species that are promoted, according to basic principles of restoration ecology (Anonymous, 2007) aim towards carbon sequestration and reduction of net CO 2 emissions.Local adaptation has been invoked for the use of local material (Mckay et al., 2005).However, it has been stated that it is seldom the ideal solution (Jones & Monaco, 2009) in degraded or nonproductive areas, and that their role is still unclear (e.g. the role of local provenance in reintroductions (Sutherland et al., 2006).
Forecasting the performance of forest species used in restoration and conservation activities rely on their intra-specific variability (Langlet, 1971;Van Andel, 1998).Due to the importance of this interpopulation differentiation, the core marketing unit of source-identified forest reproductive materials in the certification schemes is the region of provenance, or seed zone (Nanson, 2001).Therefore knowing populations differences both in their levels of diversity for neutral markers and in the variability for important adaptive and performance traits (e.g.growth, tolerance to biotic and abiotic stresses) are essential in afforestation and restoration activities (Alía et al., 2009b).
Research has reinforced the idea that the forest material supplied for native woodland creation and restoration should come from a seed source that is both genetically and ecologically proper for the planting site (Mckay et al., 2005;Leimu & Fischer, 2008;Vander-Mijnsbruggea, 2010;Sgrò et al., 2011;Breed et al., 2012;Bucharova et al., 2016), and also would be adapted to the future climatic conditions (Konnert et al., 2015).Considering conservation activities, in particular when managing conservation units or reinforcement activities, the origin of the material is also crucial, as genetic introgression and/or gene flow risk with undesirable origins or non-local material is a big concern for natural populations and their regeneration (Moritz, 1994;Moritz, 1999;Robledo-Arnuncio et al., 2009;Steinitz et al., 2012).Taking into consideration the origin of forest reproductive material is essential for the conservation of forest genetic resources (Koskela et al., 2013), e.g. when determining the gene pool that can be used in the vicinities of conservation units (Mckay et al., 2005).
In all these cases, the possibility of checking the origin of the material is essential for an effective control of the marketing of the material used in restoration and conservation activities.Forest tree species present low levels of breeding, and it is hard to have reliable fingerprinting methods to control the use of the correct reproductive material at the population level, or to avoid marketing fraud (Nanson 2001;Degen et al., 2010), despite its extensive use.Some attempts to identify forest reproductive material have been applied to specific materials in pine species (Aragonés et al., 1997;Ribeiro et al., 2002;Deguilloux et al., 2004;Tigabu et al., 2005;Fidler et al., 2006), and also different DNA fingerprinting approaches have been implemented to assign material in Quercus robur L. (Degen et al., 2010), following methods already in use for other organisms (Honjo et al., 2008).Also, some methods for origin traceability are being implemented in important tropical timber species (Tnah et al., 2009;Degen et al., 2010;Hong et al., 2010).
Different approaches have been used to identify gene pools based on genetic markers.Especially those derived from Bayesian approaches (e.g.Pritchard et al., 2000;Dupanloup et al., 2002).However, these approaches usually group populations with contrasting performance in quantitative or adaptive traits (González-Martínez et al., 2004), and also from different regions of provenance and therefore marketing units.
Extensive studies using chloroplast microsatellites (cpSSRs) covering the distribution range of different species are now available (e.g.Soranzo et al., 2000;Gómez et al., 2005;Bucci et al., 2007;Heuertz et al., 2010) .Therefore, it would be interesting to test if these markers constitute a reliable tool for the identification of material to be used in restoration and conservation activities.To our knowledge, there is not an attempt to use extensive information on the geographic variation of forest tree species to test whether it is possible to differentiate among different regions of provenance for important groups of forest species.
We tested here the identification of populations of various pine species with contrasting levels of differentiation (Soto et al., 2010), as an example for the use of extensive marker datasets of populations that are becoming available (e.g.: (GD) 2 Database: https://gd2.pierroton.inra.fr/gd2/login/login,Demiurge database: http://www.demiurge-project.org/).The species considered were Pinus halepensis Mill., Pinus pinaster Ait., Pinus nigra Arn.subsp.salzamanni (Dunal) Franco, Pinus sylvestris L., and Pinus uncinata Ram.Forest tree species play an essential role in determining many ecosystem properties and also influence the genetic diversity of associated organisms (Whitham et al., 2006).These species are highly relevant in Europe and along the Mediterranean region as they are broadly used in afforestation, restoration and conservation programs.Moreover, extensive studies on the genetic variation of the species have been done (P.halepensis: Morgante et al., 1996;P. pinaster. Vendramin et al., 1998;P. pinaster: Bucci et al., 2007;P. halepensis: Grivet et al., 2009P. halepensis: Grivet et al., , 2013;;P. pinaster and P. sylvestris: Soto et al., 2010;Unger et al., 2014), and reveal contrasting levels of variation using cpSSR markers.We used haploid cpSSR genetic markers to check the assignation of populations to different groups with application in afforestation, restoration and conservation activities.
We applied the methods to the regions of provenance of the species in the Spanish Iberian Peninsula.The area has a particular climatic regime of cold, wet winters and hot, dry summers, and a long history of human activity, grazing pressure, and fires (Valbuena-Carabaña et al., 2009).Also, reforestation and afforestation activities have a long tradition with more than 2.2 million of hectares reforested with these pine species, and representing the 74% of the total reforested area during the period 1940-1995 (Montero, 1997).
Firstly, we grouped populations for each species with use in afforestation and conservation programs (Alía et al., 2009b).Secondly, we tested the probability of assignment of different samples to the different groups, based on the Sλ genetic distance (Ribeiro et al., 2002) using a Monte Carlo method among each population and the implemented reference groups through a novel software (Blue Caterpillar).

Plant material and cpSSR determination
We sampled 97 populations (Fig. 1) from five autochthonous Pinus species covering their natural distribution range in Spain: P. halepensis (14 populations), P. pinaster (33 populations), P. nigra (19 populations), P. sylvestris (26 populations) and P. uncinata (five populations).These populations are autochthonous and include the most important regions of provenance and conservation areas with interest for restoration and afforestation (Alía et al., 2009a) or conservation activities (Jimenez et al., 2009).
Different sampled populations have been selected for conservation purposes in these species: three in P. halepensis, nine in P. pinaster, eight in P. nigra, six in P. sylvestris and three in P. uncinata.These are regions of interest in restoration and conservation activities which are summarized in Figure 1.

Statistical methods
-Population grouping.We firstly grouped the populations according to the region of provenance for each Pinus species in Spain (Alía et al., 2009a).In a second step, we grouped those regions in gene pools based on the similarity established by the cpSSR data.The regions of provenance were grouped together with ressembling genetic groups for each species.For P. pinaster and P. halepensis we used the classification done by Jaramillo-Correa et al., (2010) Bayesian clustering methods provide one of the best way to assess the genetic structure in cases of unknown genetic origin (Pritchard et al., 2000;Dawson and Belkhir 2001;Corander et al., 2003;Falush et al., 2003).Moreover, they allow testing for panmixia origin of each individual from different reference populations.We performed a STRUCTURE version 2 (Pritchard et al., 2000), without prior information on the locality of origin, allowing the allele frequencies to be correlated among them for the rest of the species.This method is the recommended configuration to be used in the case of limited population structure (Falush et al., 2003).The number of groups (K) was set from a minimum of one to a maximum of 10, and ten simulations were run for each K-value with a burnin of 100,000 and with 100,000 iterations each.Theron the mean value of the posterior probability was calculated from the ten simulations for each K, and the most likely number of clusters was selected following the methodology proposed by Evanno et al., (2005), implemented in the software STRUCTURE HARVESTER (Earl & von Holdt, 2012).For each species, three groups were defined, but there is a lack of geographical correspondence.
An analysis of molecular variance (AMOVA, Dupanloup et al., 2002) Table 1.Gene pools defined for conservation (CG), or restoration and afforestation (RG) purposes.For each gene pool the region of provenance defined for the corresponding species is specified.
distribution of genetic diversity among populations without an explicit a priori definition of population structure.AMOVA defines groups of populations that are geographically homogeneous, and maximally differentiated from each other.The significance of the variance components of the population permutated among groups (F CT ), of genotypes permutated among populations within groups (F SC ) and of genotypes permutated among populations and among groups (F ST ) were tested by 1000 permutations of individuals for each of the hierarchical levels.We tested K = 2 to 10 groups of populations.The number of groups was selected according to the highest FCT value using the sum of squared size differences between haplotypes with 1000 simulated annealing processes.
The gene pools for each species indicating the region/s of provenance included, and the number of populations sampled, are included in Table 1.
-Assignment of populations to the groups.A reference population was defined for each group by including all haplotypes of the populations from a given group.
We estimated the genetic distance S λ (Ribeiro et al., 2002) for each of the sampled populations respect each of the k groups of the different species as: Where n is the total number of different haplotypes found both in the reference group and in the λ population, X iR is theW frequency of the ith haplotype in the reference group and X iλ the frequency of the i th haplotype in the λ population.
Numerical tests based on Monte Carlo methods were used to estimate the significance of the Statistics (Manly, 1997).A given population could be assigned to various reference groups or to none of them.A non-significant Sλ distance close to 0 indicates that we cannot exclude this reference group as origin of the population.For reference groups with only one population, we measured its distinctiveness in contrast with the rest of populations.This method was implemented in the Blue Caterpillar software (https:// sites.google.com/site/navascuesresearch/publicationsconferences/software/blue-caterpillar).

Results
All the pine studied species exhibited high rates of differentiation among populations (Table 2), except for P. halepensis.Using the analysis of molecular variance (AMOVA), it was possible to distinguish different groups that ranged form three to ten depending on the species.
Those groups were subdivided according to their distinct purpose (afforestation and restoration or conservation) (Table 1).Some of the groups included only one population (8 out of 10 in P. halepensis, 12 out of 17 in P. pinaster, 5 out of 10 in P. nigra, 4 out of 12 in P. sylvestris and 3 out of 4 in P. uncinata), with most of those populations defined as important in conservation programs.
An assignment summary of either conservation and restoration or afforestation populations into the different genetic groups are presented in.The species with the highest rates of success in assignment were P. sylvestris and P. uncinata, and the one with lowest rate was P. halepensis.In any case, all the species, except P. halepensis, present high rates of correct assignments (> 80 %).It is interesting to notice that conservation populations were correctly assigned in most of the cases for all the species.
However, considering that the assignment was correct in most of the cases, it is noteworthy the case of P. halepensis and P. sylvestris where many groups are genetically close (Table 4) and, therefore, it could bias the assignment success rate.

Discussion
The five Spanish pine species used in this paper exhibit, due to their mating systems, a large level of genetic diversity within populations and a low to intermediate level of differentiation among populations (Soto et al., 2010).These features are known to improve resilience, productivity and recovery from climate extremes and give stability to the ecosystem, therefore being a key issue on the use  of forest reproductive material and conservation programs of the species (Thomson et al., 2009;Isbell et al., 2011;Alfaro et al., 2014).
We present an approach that can assign the origin of material that could be used in afforestation, restoration or conservation activities which are essential in managing the forest genetic resources.The findings showed that the method worked well for two species, P. sylvestris and P. uncinata, for another two, P. nigra and P. pinaster, with an exit of assignment over an 85%, being P. halepensis the species with the lowest rate of identification.It is noteworthy to say this is the first method that can verify the origin of those species, as research has proven to identify specific populations (Ribeiro et al., 2002;Robledo-Arnuncio et al., 2009), or has been used in the delineation of genetic zones of interest in breeding and conservation activities (Bucci and Vendramin 2000;Bucci et al., 2007).Other methods have been developed based in the identification of both the adult population and the material obtained from it (Deguilloux et al., 2004;Degen et al., 2010).
Although afforestation and restoration activities with these species are usually based on local material, which refers to the same region of provenance, in species with large spatial structure, local material can differ even at short distances.However, in our study the extensive gene flow allows to consider more extensive populations.We could not clearly differentiate populations from some regions of provenance (e.g. in P. halepensis) showing that only some groups of populations could be distinguished but also, we found that in some cases populations from the same region of provenance were different.Regions of provenance have been defined mostly based on ecological and extensive genetic information (Gil et al., 1996).Information from genetic markers is not the best option to define regions with a similar pattern of variation in traits related to adaptation or growth (e.g., Hamann et al., 2000), therefore redefinition of the limits of some regions of provenance might be needed.
The European Forest Genetic Resource Program (EUFORGEN) has defined different in-situ conservation units of the species, and the conservation program (Jimenez et al., 2009;Koskela et al., 2013) includes the definition of genetic criteria for restoration activities and for monitoring the conservation units (Aravanopoulos 2011;Graudal et al., 2014;Fussi et al., 2016).In our case, we could distinguish almost all the conservation populations considered in the study that will allow a better implementation of activities in Spain.

Conclusions
We demonstrated the usefulness of extensive markers datasets that are becoming available for identification of gene pools at the population level in different species.Nevertheless, results depend on factors such as the genetic diversity and the differentiation within and between populations.The rate of successful identification of populations among the different species was very high (> 94 %) for P. nigra, P. sylvestris and P. uncinata, high (circa 85 %) for P. pinaster, and low (< 79 %) for P. halepensis.More information is needed in P. halepensis and for some areas in P. pinaster.

Figure 1 .
Figure 1.Location of populations used in the study: a) Pinus halepensis, b) Pinus nigra, c) Pinus sylvestris, d) Pinus pinaster and Pinus uncinata.The autochthonous range of the species is included in green colour, except for P. pinaster (red) and P. uncinata (blue).The limits of regions of provenance (RG) and conservation areas (C) are included as solid lines.
was performed to define the

Table 3 .
Classification summary.For each type (Conservation, Restoration/Afforestation) we included: number of populations/number of populations well assigned/number of populations incorrectly assigned.For the species incorrectly assigned we included the code of the populations assigned to other group (*), and the code of the populations not assigned to any other group.

Table 2 .
Results of the AMOVA analysis, showing the number of groups distinguished, the differentiation among groups (F CT ), among populations within groups (F ST ) and within populations (F SC ).

Table 4 .
Mean genetic distance (Sλ) and groups to which correspond (in brackets) that were not statistically significant in the assignment of populations.The same group to which the population belongs was excluded from the analysis.
n.a.: no other group could be assigned.