Variation of five major glucosinolate genes in Brassica rapa in relation to Brassica oleracea and Arabidopsis thaliana

Glucosinolates and their derivatives isothiocyanates are important secondary metabolites in the Brassicacea that has biological activity, such as cancer protecting and biofumigant properties. The putative orthologs of five major genes in the glucosinolate biosynthetic pathway, Bra.GSELONG.a, Bra.GSALK.a, Bra.CYP83B1, Bra.SUR1.a and Bra.ST5.a, were cloned from both cDNA and genomic DNA from different subspecies of Brassica rapa. Interspecies comparative analysis disclosed high conservation of exon number and size for GS-Elong, GS-Alk, GS-CYP83B1 and GS-ST5a among B. rapa, B. oleracea and A. thaliana. Splice site mutations caused the differences observed for exon numbers and sizes in GS-SUR1 among the three species. However, the exonic sequences were highly conserved for this gene. There were not major differences of intronic sizes among the three species for these genes, except for intron 1 for GS-Elong in two subspecies of B. rapa. The cloning of the putative orthologs of all these major genes involved in the glucosinolate biosynthesis pathway of B. rapa and sequence analysis provide a useful base for their genetic manipulation and functional analysis. Additional key words: chinese cabbage, comparative genomics, glucosinolates, rapeseed, turnip.


Introduction
Glucosinolates and their breakdown products, the isocyanates, are important secondary metabolites in the Brassicacea.These compounds posses biological activity ranging from cancer protection to biofumiga-
methylthioalkylmalate synthase (MAM) gene family are known to be involved in catalyzing the side-chain elongation cycle of aliphatic GSL biosynthesis in A. thaliana.MAM1 (At5g23010) and MAM3 (At5g23020) are tandem duplicates on chromosome 5 at the GS-Elong locus (Kroymann et al., 2001;Field et al., 2004;Textor et al., 2004;Benderoth et al., 2006;Textor et al., 2007).Aliphatic GSL might be modified further by other genes, such as GS-Alk, which is responsible for the conversion of methylsufinylalkyl to alkenyl GSL (Mithen et al., 1995).During the biosynthesis of the glycone moiety, cytochrome-P450-dependent monoxygenases in the CYP79 gene family are responsible for catalyzing the conversion of amino acids to aldoximes.Five of seven CYP79 homologs identif ied in the Arabidopsis genome have been shown to be involved in this process (Hansen et al., 2001;Reintanz et al., 2001;Chen et al., 2003;Halkier and Gershenson, 2006).CYP83A1 (At4g13770) and CYP83B1 (At4g31500), two members of another cytochrome P450 gene family in Arabidopsis, have been suggested for the functional role in oxidation of aldoximes in the biosynthesis of GSL (Hansen et al., 2003).CYP83B1 has higher aff inity for the indole-3-acetaldoxime derived from tryptophan and aromatic aldoximes derived from phenylalanine or tyrosine (Naur et al., 2003).
In Arabidopsis, S-alkyl thiohydroximate is then cleaved and converted to thiohydroximic acid by the C-S lyase SUR1 gene (At2g20610), which lacks side chain specificity (Mikkelsen et al., 2004).The final step in the biosynthesis of the GSL core structure is catalyzed by a desulfoglucosinolate. Genetic and biochemical characterization of a small gene family, AtST5, coding for sulfotransferase in A. thaliana showed that ST5a (At1g74100) prefers tryptophan-and phenylalaninederived desulfoglucosinolates as substrates, whereas ST5b (At1g74090) and ST5c (At1g18590) prefer long chain aliphatic desulfoglucosinolates (Piotrowski et al., 2004).The subsequent side-chain modification reactions will determine the final structure of GSL.Methionine-derived aliphatic GSL are known to be the most extensively modified and several genetic loci have been mapped in Arabidopsis and Brassica cultivars (Parkin et al., 1994;Mithen et al., 1995;Giamoustaris and Mithen, 1996;Hall et al., 2001).
Most of the research on GSL biosynthesis has been done in the model plant A. thaliana.Some work has been applied to crops of economic importance belonging to the species B. oleracea.The two major genes involved in the aliphatic GSL side-chain elongation and modification, BoGSL-Elong and BoGSL-Alk have been cloned (Li andQuiros, 2002, 2003).Inheritance analysis also demonstrated that another gene, BoGSL-PRO, is responsible for three-carbon side chain elongation (Li et al., 2001).Two B. oleracea BAC clones, which contain the orthologs of AOP (B21H13) and MAM (B21F5) families, were identif ied and used for comparative analysis between A. thaliana and B. oleracea (Gao et al., 2004(Gao et al., , 2006)).The work presented here focused on the cDNA and genomic DNA cloning of the putative orthologs of five major genes involved in the B. rapa glucosinolate biosynthetic pathway: Bra.GSELONG.a,Bra.GSALK.a,Bra.CYP83B1, Bra.SUR1.aand Bra.ST5.a.The sequence analysis of the cDNA clones provides a useful base for further functional analysis of these genes and help to elucidate the glucosinolate biosynthesis pathway of B. rapa in the future.

Plant material
Eight varieties representing three subspecies and including five different B. rapa crops, Chinese cabbage, turnip, turnip tops, oriental cabbage and Pak choi, were used in this study (Table 1).The source of the plant material was the Brassica working collection from the Department of Plant Sciences at UC Davis.All plants were grown in the greenhouse and were sampled six weeks after germination for DNA and RNA isolation.Genes were named following the standard nomenclature proposed by Østergaard and King (2008).

DNA isolation
DNA was isolated using a modified CTAB method (Saghai-Maroof et al., 1984) in 2x DNA extraction buffer (1.4 M NaCl, 100 mM Tris-HCl, 20 mM EDTA, 2% CTAB, pH 8.0).The ground leaf samples were incubated for 1.5 hr at 65°C with occasional gentle mixing.An equal volume of chloroform: isoamyl alcohol (24:1) was added and mixed gently but thoroughly followed by centrifugation at 9,000 rpm for 10 min.The supernatant was mixed gently with an equal volume of isopropanol at room temperature.The pellet through centrifugation was re-suspended with TE buffer.RNA was removed with RNase (50 µg mL -1 ) (Roche Applied Science, IN) and RNase was removed with chloroform: isoamyl alcohol (24:1) followed by centrifugation.DNA was finally precipitated by mixing gently with 1/10 volume of 3 M sodium acetate solution, followed with 2x volume of 100% ethanol (-20°C, at least 30 min).Then the pellet was washed twice with 75% ethanol.DNA was dissolved in TE buffer and the sample concentrations were determined with a spectrophotometer (NanoDrop ND-1000, Thermo Scientific, MA).

RNA isolation and RT-PCR
RNA from leaf tissue was isolated with Trizol ® Reagent (Invitrogen, CA) according to manufacturer's instruction and the concentration was determined with the aid of spectrophotometer (NanoDrop ND-1000).First-strand cDNA was synthesized using M-MLV reverse transcriptase following the manufacturer's protocol (Invitrogen), except that only 1/10 × concentration of the specified oligo(dT) 15 primer was used and the incubation time at 37°C was extended from 50 min to 2 hr.A 200 µL reaction mixture was used for 50 µg of total RNA.The reverse transcription product was purified by using the QIAquick PCR purification kit according to the manufacturer's instructions (Qiagen, CA), except that the cDNA was eluted with 50 µL Elution Buffer (EB) buffer twice in the last step.Consensus primers were designed based on sequences of the glucosinolate biosynthesis genes reported in either A. thaliana or B. oleracea to amplify the full-length corresponding gene in B. rapa.The rest of the primers in this study were designed directly based on the B. rapa BAC end sequence database (Table 2).PCR reactions were performed in a 50 µL final volume containing 200 ng of cDNA, 1 unit of high fidelity plati-num Taq DNA polymerase (Invitrogen), 1x platinum Taq DNA polymerase buffer, 2 mM MgSO 4 , 0.2 mM dNTPs, 10 pmol of each primer with the following reaction conditions: 94°C for 2 min, followed by 30 cycles with 94°C for 30 s, 55°C for 30 s and 68°C for 2 min, final extension at 68°C for 30 s.
The consensus primers YB9 and YB40 (Table 2), designed for amplification of the full-length Bra.GSELONG.a in B. rapa, were based on 5' and 3' UTR region sequence of the corresponding B. oleracea gene BoGSL-Elong.Primers IPM2 and PM44 (Table 2), designed for subsequent cloning and sequencing of this gene, were based on the first and the last exon sequences of BoGSL-Elong, respectively (Li and Quiros, 2002).The consensus primers ODD48 and ODD62 (Table 2), designed for amplification of the full-length Bra.GSALK.agene in B. rapa, were based on 5' and 3' UTR region sequence of BoGSL-ALK.ODD57 and ODD62 (  BoGSL-ALK (Li and Quiros, 2003).Three B. rapa genes corresponding to the A. thaliana orthologs in the glycone formation pathway, CYP83B1, SUR1 and ST5a, were amplified from cDNA of young leaves from several different varieties of B. rapa.All the primers designed for the amplification of these three genes were based on the public B. rapa BAC end sequence database from GenBank and located on the 5' or 3' end UTR region of the genes (Table 2).

Cloning and sequencing
The amplified fragments were resolved in 1% w/v agarose gel, eluted and purified using QIAquick gel extraction kit (Qiagen) according to the manufacturer's protocol.pGEM-T Easy vector system I (Promega, WI) was used to clone the fragments.Both the ligation reaction and transformation followed the manufacturer's protocols.After the preparation of the cloned products with QIA prep spin mini prep kit (Qiagen), the nucleic acid sequences were determined using an ABI 3730 capillary electrophoresis genetic analyzer at the UC Davis DBS (Division of Biological Sciences) DNA sequencing facility.To avoid PCR-based mutations, clones of fragments were prepared from three independent amplifications.Three clones from each amplification product were sequenced.

Sequence analysis
The sequence was viewed using ContigExpress of Vector NTI Advance 9 (Invitrogen).The coding regions of the genes were deduced based on the comparison of genomic DNA sequences and cDNA sequences of the genes.The alignments among different crops or species were performed by using AlignX of Vector NTI Advance 9 (Invitrogen).The dotplot matrices were constructed using DNADOT (http://arbl.cvmbs.colostate.edu/molkit/dnadot/).

Bra.GSELONG.a, side chain elongation gene
This gene was amplif ied from both cDNA and genomic DNA of young leaves of turnip 'Yorii spring' (B0493) and Chinese cabbage 'Matsushima' (B0488).It has 10 exons in both B. oleracea and B. rapa.Its genomic sequences were quite different for turnip and Chinese cabbage although both crops belong to the species B. rapa but to different subespecies, rapa and pekinensis, respectively.The turnip allele is much longer (EF611254, 3896bp) than that of Chinese cabbage allele (EF611255, 2833bp).This difference is due to the large size difference for introns 1 and 2 and to a lesser extent for the other introns (Fig. 1a).The exonic regions and the encoded amino acid sequences of both alleles are 100% identical with each other and 98% similar with those of B. oleracea BoGSL-Elong (AF399834) (Tables 3 and 4).Because of the large differences in size for introns 1 and 2 in the B. rapa alleles, the turnip gene has higher similarity to the BoGSL-Elong (96% in genomic DNA) than to Chinese cabbage allele (73% in genomic DNA, Table 3).Dotplot matrices of BoGSL-Elong versus turnip and Chinese cabbage Bra.GSELONG.agenomic DNA sequences show that differences between the genes of both species are primarily due to their intronic regions (Figs.2a and 2b).
The identity between the amino acid sequences of Bra.GSELONG.aalleles and the A. thaliana members of the MAM gene family, to which the gene GS-Elong belongs, ranged from 55% to 80% (Table 4).The Bra.GSELONG.aalleles showed higher similarity with MAM1 and MAM2 than to the other three members (MAM3, MAML-3 and MAML-4).The sequences of B. rapa corresponding to genes CYP83B1, SUR1 and ST5a were amplified from both cDNA and genomic DNA of young leaves of turnip 'Yorii spring' (B0493, Table 1) and Chinese cabbage 'Matsushima' (B0488).
The coding and genomic DNA sequences of CYP83B1 for these two crops (EF611258, EF611259) are both 100% identical (Table 5).Both CYP83B1 from A. thaliana and its corresponding gene Bra.CYP83B1 have two exons of conserved sizes and one intron, 57 bp shorter in A. thaliana .These sequences share 89% similarity for the genomic DNA sequences and 90% similarity for the coding sequences (Table 5).
The genomic sequences for the Bra.SUR1.aalleles of turnip (EF611281) and Chinese cabbage (EF611282) have 100% similarity (Table 6).There are 83% similarity between A. thaliana SUR1 and Bra.SUR1.agenomic DNA sequences, and 87% similarity for their coding sequences.Also there is 81% similarity between A. thaliana SUR1 and BoGSL-SUR1 genomic DNA sequences, and 87% similarity for their coding sequences.For the BoGSL-SUR1 and Bra.SUR1.a,there is 89% similarity in genomic DNA sequences, and 92% similarity for their coding sequences (Table 6). A. thaliana SUR1 has seven exons, whereas Bra.SUR1.aand BoGSL-SUR1 have five exons.This change seems to be due to loss of introns 2 and 3 and fusion of exons 2, 3 and 4 in Bra.SUR1.aand BoGSL-SUR1 plus other minor base substitutions in the rest of introns and exons  (Fig. 1d).The conservation of the exonic regions of Bra.SUR1.aand A. thaliana SUR1 was high.Only the first and fourth exon had the same size between the Bra.SUR1.aand BoGSL-SUR1. A. thaliana ST5a and its B. rapa counterpart Bra.ST5.a(EF611261, EF611262) are intronless.The shared similarity between their encoding amino acid sequences is 92% (Table 7).The coding DNA sequences for Bra.ST5.a, were also amplif ied from the other six varieties of B. rapa (EF611263 to EF611270) listed in Table 1.Two copies were found in turnip tops 'Rapa 60 Giorni'.The longer copy, Bra.ST5.a(EF611269, 1020bp), has higher identity with the corresponding A. thaliana ST5a gene than the shorter Bra.ST5.b (EF611270, 261bp).For all the other seven varieties Bra.ST5.a was cloned and sequenced.They are all identical to each other, but the Bra.ST5.aallele of 'Rapa 60 Giorni» has two amino acid substitutions.8).However, BoGSL-Alk has three exons whereas the B. rapa alleles have four.This is due to the splitting of exon 2 in the latter species.However, the total length of exons 2 and 3 and their intervening intron is only 1 bp longer than exon 2 in B. oleracea (691 bp).Additionally, the last exon is one base shorter in both alleles of B. rapa.The lengths of the introns are also different between the two genes (Fig. 1b) and the similarity between their encoded amino acid sequences is 97% (Table 9).The coding DNA sequence of GS-Alk in A. thaliana (780 bp) is shorter than those of both BoGSL-Alk and Bra.GSALK.a.The identity between the encoded amino acid sequences of this gene in B. rapa and its corresponding genes in the A. thaliana AOP family ranged from 36% to 64% (Table 9) Bra.GSALK.ashowed higher amino acid sequence similarity with AOP1 (64%) than with the other two members in AOP family [AOP2 (GS-ALK), 45%; AOP3 (GS-OHP), 36%].However the comparison among the full-length cDNA sequences of these A. thaliana genes and Bra.GSALK.ashows that GS-AOP2 has higher similarity (76%) with the B. rapa gene than GS-AOP1 (62%) and GS-AOP3 (36%).The reason for these differences is the presence of conserved exonic regions in 5'UTR and 3'UTR of GS-AOP2 and Bra.GSALK.a.

Bra.GSALK.a, side chain modification gene
Southern blot analysis suggests the existence of two to four copies of Bra.GSALK.a in B. rapa (data not shown).Therefore, other duplicate GS-Alk genes must be isolated before attempting to assign orthology among species, considering that these genes are also duplicated in B. oleracea (Li and Quiros, 2003).

Discussion
Only one copy corresponding to BoGSL-Elong could be amplif ied and cloned from B. rapa in our study.The sequence comparison showed that this copy has higher similarity to MAM1, compared to other members of MAM family in A. thaliana (Graser et al., 2000;Kroymann et al., 2001;Field et al., 2004;Textor et al., 2004Textor et al., , 2007;;Benderoth et al., 2006).Therefore, it is likely that Bra.GSELONG.a is the ortholog for this gene, considering its high amino acid sequence similarity to BoGSL-Elong.Orthology between this B. oleracea gene and MAM1 was previously established by Li and Quiros (2002) and Gao et al. (2006).It is interesting that the large size variation between the turnip and Chinese cabbage alleles for this gene is due mostly to the large deletion in intron 1 in the latter, which might have occurred during the independent domestication of these two crops (McGrath and Quiros, 1992).The Chinese cabbage accession carrying this deletion contains lower amounts of 4-carbon side chain aliphatic GSLs, which are the products of this gene, compared to the turnip accession.It is unknown whether this deletion affects gene function considering that it is in a non-translated region.On the other hand this gene is non-functional in most cauliflowers due to a splicing mutation in intron 3, resulting in a longer transcript (Li and Quiros, 2002).
Little or no variation was observed for the different alleles of the three glycone formation genes in the three species except for the exon fusion in the cultivated

Figure 1 .
Figure 1.Gene structure diagrams comparing exon number and size among different crops and species.Exons are represented by boxes, introns are represented by lines.a) BoGSL-Elong and its two corresponding B. rapa alleles, Bra.GSELONG.a-1and Bra.GSE-LONG.a-2for turnip and Chinese cabbage, respectively.b) BoGSL-Alk and its two corresponding B. rapa alleles, Bra.GSALK.a-1and Bra.GSALK.a-2for turnip and Chinese cabbage, respectively.c) CYP83B1 in A. thaliana, B. oleracea and its two corresponding B. rapa alleles Bra.CYP83B1-1 and Bra.CYP83B1-2 in turnip and Chinese cabbage, respectively.d) SUR1 in A. thaliana, BoGSL-SUR1 and its two corresponding B. rapa alleles Bra.SUR1.a in turnip and Chinese cabbage.

Table 1 .
Index for Brassica rapa varieties a Accession number.b Oriental cabbage.c Chinese cabbage.

Table 2 .
Primer sequences used for DNA amplification

Table 4 .
Variation of glucosinolate genes in B. rapa 667 Identity for amino acid sequences (in %) of GS-Elong gene and its counterparts in B. oleracea, B. rapa and A. thaliana (MAM gene family)

Table 5 .
Identity of nucleotide sequences (in %) for genomic (g) and coding sequences (CDS) of CYP83B1 and its counterparts in A. thaliana and two varieties of B. rapa (B0493 and B0488).

Table 6 .
Identity of nucleotide sequences (in %) for SUR1 and its counterparts in A. thaliana, B. oleracea and two varieties ofB.rapa (B0493 and B0488)

Table 7 .
Identity of amino acid sequences (in %) for ST5a gene and its counterparts in A. thaliana (At) and eight accessions a of B. rapa a Varieties corresponding to accession numbers are listed in Table1.The two copies of this gene in B1668 are listed as ST5a and ST5b.

Table 8 .
Variation of glucosinolate genes in B. rapa 669 Identity for nucleotide genomic (g) and coding (CDS) sequences (in %) for GS-Alk gene and its counterparts in B. oleracea and two varieties of B. rapa (B0493 and B0488)

Table 9 .
Identity of amino acid sequences (in %) for GS-Alk gene and its counterparts inB.oleracea, B. rapa and A. thaliana (AOP family)