New validated Eucalyptus SSR markers located in candidate genes involved in growth and plant development

. Abstract Aim of study: To validate and characterize new microsatellites or Simple Sequence Repeats (SSR) markers, located within genomic transcribed sequences related to growth and plant developmental traits, in Eucalyptus species. Area of study: Eucalyptus species from different Australian origins planted in Argentina. Material and methods: In total, 134 SSR in 129 candidate genes (CG-SSR) involved in plant development were selected and physically mapped to the E. grandis reference genome by bioinformatic tools. Experimental validation and polymorphism analysis were performed on 48 individuals from E. grandis and interspecific hybrids ( E. grandis x E. camaldulensis ; E. grandis x E. tereticornis ), E. globulus, E. maidenii, E. dunnii and E. benthamii. Main results: 131 out of 134 CG-SSR were mapped on the 11 chromosomes of E. grandis reference genome. Most of the 134 analyzed SSR (> 75%) were positively amplified and 39 were polymorphic in at least one species. A search of annotated genes within a 25 kbp up and downstream region of each SSR location retrieved 773 genes of interest. Research highlights: The new validated and characterized CG-SSR are potentially suitable for comparative QTL mapping, molecular marker-assisted breeding (MAB) and population genetic studies across different species within Symphyomyrtus subgenus.


Introduction
The Eucalyptus genus groups forest species and hybrids widely cultivated in the world as forestry plantations for the wood industry as renewable sources for timber, paper and pulp production (Govindan, 2005;Bauhus et al., 2010). The high number of species within this genus, with different agroecological requirements and wood quality characteristics, makes Eucalyptus a very valuable resource for its adaptation to the numerous ecosystems worldwide. In this context, the Eucalyptus global production is estimated at 20 million hectares (Wingfield et al., 2015).
Over the last 20 years, forest breeding programs, which require long developmental periods, have incorporated different molecular tools (Gudeta, 2018). Molecular markers detect differences between individuals directly from the genome and provide extensive discrete data that can be useful for statistical analyses. These tools are suitable to control the genetic traceability during multiplication processes, evaluate the genetic diversity and make predictions of more reliable breeding values (Cappa et al., 2016;Gudeta, 2018 , 1996), are the most widely used markers for different applications (Hodel et al., 2016). The development of microsatellite markers in Eucalyptus has evolved over the years in parallel with the increased availability of sequence information. To date, different types of neutral microsatellites are being used in Eucalyptus and new SSR markers have been developed within the transcribed regions of the genome. In addition, researchers have designed SSR markers after mining the increasingly larger EST (Expressed Sequence Tags) collections deposited in sequence databases (Kirst et al., 2005;Lehouque et al., 2008;Faria et al., 2010;Acuña et al., 2012 (a, b); Zhou et al., 2014;Grattapaglia et al., 2015).
Several strategies have been explored to find loci controlling traits of interest in woody species. Thus, many genomic studies have reported the analysis of genes and transcription factors expressed during wood formation and xylogenesis in Eucalyptus (reviewed by Foucart et al., 2006;Paux et al., 2004;Rengel et al., 2009). Besides, for genes controlling plant growth traits several QTL (Quantitative Trait Loci) approaches have been carried out (revised in Gion et al., 2015;Li et al., 2015;Du et al., 2018;Müller et al., 2019;Kainer et al., 2019).
Therefore, the use of already well-established polymorphic markers located in CG for plant traits is an interesting approach for mapping purposes and population genetic studies with an emphasis in non-model species (Acuña et al., 2012b(Acuña et al., , 2014Pomponio et al., 2015;Azpilicueta et al., 2016). Also, the availability of E. grandis genome sequence, with annotated and classified genes, is an important information source to study and characterize different traits of interest (Myburg et al., 2014).
Although high-throughput sequence-based SNP marker assays are increasingly becoming available , Aguirre et al., 2019, microsatellites still constitute a very useful and accessible tool for fast and precise genetic analysis in Eucalyptus . Besides genetic diversity studies, SSR have numerous uses, including cultivar or clone fingerprinting, population structure, marker-assisted selection, linkage map development and QTL mapping, among others, thus showing an important role in this genomic age (Hodel et al., 2016).
In this study, we in silico characterized and in vitro validated new microsatellite markers located in CG (structural genes and transcriptional factors) related to plant growth and development. The selection of SSR located on these genes was based on data from a previous study of our group (Acuña et al, 2012a), with a focus on genome regions potentially involved in these important characteristics for tree breeding. The identified SSR markers were wet-lab validated in five different Eucalyptus species and hybrids, and physically mapped on the E. grandis reference genome. Furthermo-re, we also identified and analyzed known predicted genes contiguous (<25kbp) to these SSR markers.
We selected SSR markers on CG from a previous study (Acuña et al., 2012a), 1,140 SSR within 952 CG were in silico characterized. These CG had been selected for their possible biological function predicted according to Gene Ontology (GO) (Ashburner et al., 2000; http://www.geneontology.org/) using Blast2GO (Conesa et al., 2005). From those genes, in the present study, 129 CG with 134 SSR sequences were selected based on their correspondence to genes and transcriptional factors involved in different plant growth and developmental features.
Validation was carried out using PCR reactions in a final volume of 12 µl with 20 ng of genomic DNA, 0.25 µM of each primer (Alpha DNA, Canada), 2mM MgCl2, 0.2 mM of each dNTP, 1X reaction buffer and 1U Platinum Taq polymerase (Invitrogen, Waltham, USA). Amplifications were performed following a denaturation step of 5 min at 94 °C, 35 cycles of 1 min at 94 °C, 1 min at annealing temperature and 1 min at 72 °C. The final extension step was for 10 min at 72 °C. The SSR amplification products were denatured for 5 min in denaturing loading buffer at 95 °C and separated by a 6% polyacrylamide gel electrophoresis (6% acrylamide/bisacrylamide 20:1, 7.5 M urea, 0.5 × TBE) along with a 25 bp DNA ladder standard (Invitrogen, Waltham, USA). The DNA silver-staining procedure of Promega (Madison, WI, USA) was used for visualization. Details on primer sequences, SSR location, annealing temperature and product sizes are described in Table S1 [supplementary].
We carried out the in silico characterization of the CG-SSR through physical mapping and nearby gene search. The 134 SSR obtained sequences were mapped to the E. grandis reference genome (Myburg et al., 2014) (http:// phytozome.jgi.doe.gov, version 2.0). Mapping was performed using the Bowtie2 alignment tool with default settings (Langmead & Salzberg, 2012). A custom Perl script was used to determine the annotated genes of the E. grandis genome within a flanking region of 50 kbp (up to 25 kbp from each SSR locus). The predicted genes classified and reported by Myburg et al. (2014) were used to describe some of the genes found within the window surrounding each SSR.

Selected CG-SSR markers
Based on the results from a previous study (Acuña et al., 2012a), we selected sequences similar to structural genes and transcriptional factors involved in plant developmental features and obtained 129 CG with 134 SSR sequences (details of SSR markers in Table S1 [suppl.]). This study revealed the following distribution of the selected 129 GC within the GO terms: most of them belonged to the "Biological Process" category and within this category, to the subcategories "Metabolic Process of Organic Substances" (17%), "Cellular Metabolic Process" (15%) and "Primary Metabolic Process" (15%), among other less represented subcategories. Also, most of the GO terms within the "Molecular Function" class belonged to the subcategories "Binding to Heterocyclic Compounds" (15%), "Binding to Organic Cyclic Compounds" (15%) and "Ion Binding" (14%). Among them, we detected SSR in transcriptional factor genes involved in xylogenesis (MYB, bZIP, WRKY, SWI/SNF, ARF) (Rengel et al., 2009) and responses to abiotic stress (BES, bZIP) (Bechtold & Field, 2018) (Table S1 [suppl.]).
The relative proportions of repeated motifs in polymorphic SSR were 28.2% for di-, 56.4% for tri-, 10.2% for tetra-, and 5.2% for pentanucleotides. Our results are similar to those reported by other authors, in which trinucleotide repeats were the most common, followed by di-and tetranucleotide repeats (Varshney et al., 2005. The number of alleles per marker (between 2 and 7) (Table S1 [suppl.]) is equivalent to that reported by other authors for EST-SSR markers validated on a small number of samples (about 8 individuals per species) (Zhou et al., 2014). On the other hand, the values found in the present study are lower than those described by Faria et al. (2010Faria et al. ( , 2011 and Grattapaglia et al. (2015) in Eucalyptus. These results could be explained by the marker selection criteria used. While these studies based their selection on the polymorphism level, we selected SSR markers focusing on their putative function in growth and plant development. Nevertheless, the number of alleles per marker may increase with a larger sample size.

Physical mapping and nearby gene search
The alignment of the 134 CG-SSR sequences against the E. grandis public reference genome revealed that 131 SSR were mapped on the 11 chromosomes, while three of the markers were located on scaffolds. The number of SSR by chromosome ranged between 4 (Chromosome 5) and 17 (chromosomes 3, 8 and 11), thus showing a good distribution in the genome (Table S2 [ suppl.]).
According to an exhaustive bibliographic revision of the available publications that developed this kind of markers in Eucalyptus, only 16 of the 134 SSR validated here coincide with those of other studies (2 in Yasodha et al., 2008;8 in Rengel et al., 2009;4 in He et al., 2012;4 in Zhou et al., 2014 and1 in Grattapaglia et al., 2015, where some markers were shared between studies). Nonetheless, none of these studies involved the characterization related to plant growth and developmental traits. Moreover, only in this study and in that by Grattapaglia et al. (2015), EST-SSR were aligned to the E. grandis genome sequence, thus providing information on their distribution and physical position (Table S1 [suppl.]).
Interestingly, the 39 polymorphic SSR markers are located in protein-coding CG, e.g. serine-threonine kinase (which is involved in the completion of embryonic development in dormant seeds), F-box type (signal transduction and cell cycle) (Jia et al., 2020) and various transcription factors that regulate processes of cellular development, seed maturation, floral development, among others (bZIP, GATA, BES1) (Bechtold and Field, 2018) (Table S1 [ suppl.]).
Additionally, we performed a search for genes of interest that could be linked to the identified SSR within a flanking window of 50 kpb (25kpb up-and downstream regions). This window size was selected based on Linkage Disequilibrium (LD) in E. grandis reported by . This analysis yielded 773 E. grandis predicted genes (named Eucgr. in Myburg et al., 2014) Forest Systems December 2020 • Volume 29 • Issue 3 • eSC08 neighbouring these SSR. Among them, 394 were within a Gene Ontology (GO) category (Table S2 [ suppl.]). Based on Myburg et al. (2014), who classified predicted genes according to different classes related to wood quality, 30 of the 773 genes belong to the following categories: 3 into "MYB Transcription Factors", 1 into "Genes Encoding Laccases and Peroxidases", 8 into "Lignin Biosynthesis", 7 into "predicted cellulose and xylan genes" and 11 into "Interpro Domain of 968 Unique Eucalyptus Genes". This categorization gives these markers an added value, since we detected genes related not only to plant growth and development, but also to wood quality (Table S2 [ suppl.]). Examples of genes related to wood quality are cinnamoyl CoA reductase (CCR), phenylalanine ammonia-lyase (PAL), 4-coumarate-CoA ligase (4CL) and Caffeic Acid O-Methyl Transferase (COMT) genes, which code for the key enzymes in lignin biosynthesis (Boerjan et al., 2003). Other of these identified genes are PARVUS, cellulose synthase (CESA) and sucrose synthase (SUSY), which are involved in cellulose and xylan biosynthesis (Myburg et al., 2014).
In this work, the evaluated candidate genes sequences were up to 25 kbp distance from the validated SSR markers. Therefore, linkage between them seems to be high enough to make this panel of SSR markers useful in future association mapping studies for Eucalyptus breeding purposes.

Conclusions
In the present study, a new set of SSR especially located in candidate genes for growth and plant development is proposed as a tool for Eucalyptus genetic analysis. Additionally, some of the SSR are particularly interesting, because they are close to candidate genes for wood quality.
These new CG-SSR markers, in addition to those already publicly available, could be included in studies for the identification of different Eucalyptus genetic materials, in population genetics, taxonomy, verification of synteny and collinearity between different Eucalyptus maps. Furthermore, they could be implemented in QTL and association mapping studies and genomic selection through relatedness correction in breeding value predictions.