Usefulness of portable near infrared spectroscopy in olive breeding programs

The usefulness of portable near infrared (NIR) spectroscopy as a simple and efficient method to determine some of the main selection traits in olive breeding is evaluated in this work. Calibration models were developed and evaluated using partial least squares (PLS) regression from samples collected in different selection steps of the breeding work and under different experimental conditions. The results showed that accurate enough models (values of correlation between actual and predicted constituent higher than 0.9) were obtained for oil and moisture content in both cross-validation and prediction results. Portable NIR spectroscopy could be used for selection of genotypes on the basis of these characters, providing similar ranking of genotypes than reference methods both in different selection steps of the breeding process (progenies and selection plots) and different experimental conditions (on-tree or under laboratory conditions). The advantages of this technique to improve the efficiency of the evaluation process in olive breeding programs are discussed.


Introduction
The analytical characteristics of near infrared (NIR) spectroscopy offer the possibility of a rapid and non-destructive analysis, without sample preparation neither the use of chemical reagents, which make this technique attractive as alternative to traditional, tedious and more time-consuming analytical methods. These ad-intact fruits have been scarcely reported. Instrumental evolution in recent years has opened new possibilities for faster analysis, lower sample size, and more portable devices allowing on-field analysis (Nicolaï et al., 2007). The development of portable instruments bring in the possibility for on-tree analysis in the field avoiding the need for sample transportation to the lab, which could be particularly interesting for facilitating the evaluation process in multi location trials in several environments. Moreover, portable instruments allow the analysis when small amount of fruits are available as usually occurs in the initial selection steps of breeding works. Several applications of portable NIR have been recently reported for the determination of fruit quality characters in fruit species (Kusumiyati et al., 2008;Camps & Christen, 2009;Perez-Marin et al., 2009). However, most of them were focused on the determination of the optimal harvesting time or sorting fruits for consumer taste preferences, and direct application for breeding works has not been suggested. In olive, recent works reported promising results for the prediction of fruit parameters with portable spectrometers working under laboratory conditions although application in breeding works was not suggested (Cayuela et al., 2009;Cayuela & Pérez-Camino, 2010).
The aim of this work was to asses the feasibility of portable NIR technology for the determination of oil and moisture contents, the two main components of olive fruit, to be used as a decision support tool in olive breeding programs. For that, two different experiment were carried out with samples collected in different selection steps of the breeding work (progenies and selection plots) and under different experimental conditions (on-tree and under laboratory conditions).

Plant material
Plant material corresponding to two different selection steps (progenies and selections) of breeding were used in this study.
In Spain, an olive breeding program was initiated in 1991 to obtain new olive cultivars for olive oil production. Early bearing, high yield and oil content are some of the main objectives of the breeding program (León et al., 2007). The selection procedure is carried out in several steps from the germination of seeds to the final registration of new cultivars. Several genotypes, usually around 5-10% of the initial progenies populations, are selected after two harvest seasons (4-5 years after planting) and vegetatively propagated for further evaluation. Several comparative field trials steps are then sequentially carried out with selected genotypes. In each step the number of selected genotypes is reduced and the number of replications per genotype increases up to a final step of comparative field trial in several environments including only a reduced number of genotypes. Therefore, a high number of samples must be analyzed every year particularly for some general characters such as oil and moisture contents evaluated in all the steps of the selection procedure. The possibility of a rapid and simultaneous non-destructive analysis of intact olive fruits and, therefore, cost and labor savings compared with running slow chemical analyses, make NIR a potential useful tool for selection. In olive, like in other fruit species, the length of the juvenile period and the necessary evaluation of large number of genotypes have been traditionally considered the main drawbacks and the reasons for the limited efforts for breeding carried out until recently. In the last years, several works allow the development of methodologies to shorten the juvenile period (Lavee et al., 1996;Santos-Antunes et al., 2005), but there is still a need for fast and cheap analytical procedures to determine the agronomic traits of interest.
In the last decades, NIR spectroscopy has received increasing importance for nondestructive measurements of several characters in a wide range of fruit and vegetable products including olive (Nicolaï et al., 2007). In olive, previous works have demonstrated the feasibility of NIR for a rapid analysis for both quantitative analysis of main components or discrimination of samples according to cultivar origin or common alterations of olives (reviewed by Armenta et al., 2010). However, these works were carried out using benchtop instruments requiring relatively large amount of sample and under laboratory conditions, and direct analysis of Usefulness of portable NIR in olive breeding in 2004 and 2005 respectively and seedlings were submitted to the habitual protocol followed in the breeding program (Santos-Antunes et al., 2005). A total of 183 seedlings were evaluated in these plots.
Two selection plots were also evaluated including genotypes selected in progenies from crosses carried out in 1998/99. Genotypes were selected according to the procedure followed in the breeding program (León et al., 2007), and vegetatively propagated to established these two selections plots in 2007. Sixty seven selections with two trees per selection and 11 selections with four trees per selection were sampled in the first and second plot respectively.

Spectral collection and analysis by reference methods
In the progenies plots, NIR spectra of fruits on-tree were obtained in open field and afterwards fruits samples were carried to laboratory. In the selection plots fruits were harvested and brought to laboratory where spectral collection was carried out. All spectra were acquired in absorbance [log (1/Reflectance)] with a portable acousto-optical tunable filters AOFT-NIR spectrophotometer (Luminar 5,030, Brimrose Corp., MD, USA) in the wavelength region from 1,100 to 2,300 nm at 1 nm intervals. The instrument was configured to obtain an average of 50 spectra in each individual olive fruit, acquired along the equatorial circumference of the fruit. Ten fruits per plant were scanned in both cases.
After spectral collection, fruit samples were processed in laboratory for analysis by reference methods. Fresh samples were weighted and then dried in a forced-air oven at 105ºC for 42 h to determine moisture content. Oil content of dried samples was recorded by NMR Minispec NMS100 (Bruker Optik GmbH, Ettlingen, Germany). Reference analyses were carried out in triplicate and average data used for development of NIR models.

Data analysis
Spectral data were exported to the Unscrambler software (CAMO A/S, Trondheim, Norway) for chemometric analysis. Average spectral data of 10 fruits plant -1 were used in all subsequent analysis. Calibration models were developed and evaluated using partial least squares (PLS) regression from raw data and full cross-validation (i.e. leave-one-out) was used for determining the performance of the models. The number of PLS factors was selected as recommended by default settings of the software. Calibration models were developed from samples of the first progenies and selections plots and externally validated with samples of the second progenies and selections plots respectively to assess the performance of calibrations across different populations of samples.
Correlation between actual and predicted constituent values (r), bias, standard error of cross validation (RMSECV) and standard error of prediction (RMSEP) on a separate sample set were used to test the performance of calibrations. The residual predictive deviation (RPD), defined as the ratio of the standard deviation for any given constituent to the standard error of cross validation or prediction for the same constituent, was also determined.
Ranking of genotypes from both the progenies and selections validation sample sets was studied to indicate the relative utility of NIR models for selection in breeding programs.

Reference data and spectral features
Descriptive statistics for oil and moisture content of olive fruit samples by origin (progenies and selections plots) and application for development of NIR models (calibration and validation samples) are shown in Table  1. In general terms, a wide variability was obtained for both oil and moisture content with an overall range of variation from 4.56 to 33.7% for oil content and 34.41 to 70.11% for moisture. The range of variation was higher for calibration samples due to the inclusion of genotypes from open pollination of wild olives in both progenies and selections plots but no in validation samples. Wild olives are characterized by extremely low oil content, which expand the range of variation for this character in calibration samples.
Average raw spectrum of olives fruit samples shows characteristic absorption bands by water around 1,450 and 1,950 nm and by oil around 1,200 and 1,750 nm ( Figure 1). After transformation, the second derivative spectra show troughs were the original raw spectra have peaks with defined minima for oil absorption bands at  1,164, 1,190, 1,212, 1,710, 1,730 and 1,764 nm, which allow clearly differentiation between high and low oil content samples (frames zoomed in Figure 1).

NIR models in samples from progenies
Cross-validation and prediction results for calibration models developed from samples collected in progenies plots are shown in Table 2. Correlation values between determined and predicted constituent higher than 0.9 and RPD higher or close to 2.5 were obtained for oil and moisture content in both cross-validation and prediction results. It should be noted that similar results were obtained in full cross-validation (i.e. leaveone-out) than in external validation with different populations of samples not previously used to develop the calibrations, which indicates the accuracy and robustness of the models. Optimal models included 9 and 7 latent variables for oil and moisture content respectively and regression coefficients showed opposite trends for both characters in most of the wavelength region (Figure 2), which can be attributed to the negative relationship between them. In fact, a significant negative correlation between oil and moisture content was observed (r = -0.59; p < 0.001). Figure 3 shows predicted vs. reference plots for oil and moisture content in validation samples from progenies plots. The mean ± SD values in each case were selected as boundaries that for normally distributed data represent approximately 66% of the range. No samples ranked above the upper boundary for one method (reference or predicted values) showed values lower than the mean value for the other method and most of them were also above the upper boundary: 6 out of 9 samples with reference values for oil content higher than mean + SD (potentially selected genotypes) showed also predicted values higher than mean + SD, and 6 out of 7 for moisture content. Reciprocally, no samples ranked below the lower boundary for one method, which could be potentially discarded in case of selection for high oil content, showed values higher than the mean value for the other method. For oil and moisture content, 11 out of 12 and 9 out of 10 samples respectively with reference values lower than mean -SD showed also predicted values lower than mean -SD. Similarly, a Spearman rank correlation test performed on the genotype rankings showed highly significant rank correlation between predicted vs. reference values for both oil content (r = 0.85, p < 0.001) and moisture content (r = 0.92, p < 0.001) and, therefore, correct NIR prediction ranking of genotypes.

NIR models in samples from selections
Models developed from samples collected in selections plots provided slightly better results than the above mentioned for samples collected in progenies plots, with higher values of r and lower values for RMSECV and RMSEP, and, consequently, higher RPD values (Table 2).  Analysis of variance showed significant differences between genotypes for oil and moisture content from both reference and NIR predicted data externally validated with samples of the second selection plot (Tables  3 and 4). The distribution of sums of squares between and within genotypes was almost the same in both reference and NIR predicted data, with around 80-85% and 15-20% due to differences between and within genotypes respectively. Ranking of genotypes calculated from reference and NIR predicted data was also quite similar with average values for two genotypes (Selection 6 and 9) showing clearly the highest values for oil content and lowest for moisture.

Discussion
In this work, two independent experiments involving plant materials from different selection steps of an olive breeding program have been carried out to determine the possible application of portable NIR technology for the determination of oil and moisture contents, the two main components of olive fruit. The first step in olive breeding involves initial progenies populations, which must be evaluated for the agronomic traits of interest. The results obtained in this work suggest that NIR can be used for on-tree prediction of fruit moisture and oil content in intact olives with similar accuracy than the previously reported using laboratory (not portable) instruments (León et al., 2004;Ayora-Canada et al., 2005;Dupuy et al., 2010). Using portable instruments under laboratory conditions, Cayuela et al. (2009) obtained r values of 0.63 and 0.88 and RPD values of 1.47 and 2.46 in cross-validation for oil content and moisture respectively working with the same instrument used in this work, although samples from a single batch from a single cultivar were analyzed. Using a different portable instrument also under laboratory conditions, Cayuela & Pérez-Camino (2010) obtained r values of 0.78 and 0.76 and RPD values of 2.77 and 2.51 for oil content and moisture respectively in cross-validation. The correlation values between actual and predicted constituent obtained in this work (higher than 0.9) indicate a good fit of the predictive models and, from the practical point of view, accurate enough for ranking of genotypes and discrimination into high, medium and low content and, therefore, useful for selection of interesting genotypes (Fassio & Cozzolino, 2004). More precise results will be probably needed for other uses such as industrial applications. RPD values between 2 and 2.5 indicates that coarse quantitative predictions are possible, and values between 2.5 and 3 or above corresponds to good and excellent prediction accuracy, respectively (Nicolaï Usefulness of portable NIR in olive breeding et al., 2007). It should be noted that even more accurate results are likely to be possible by developing models from spectral and reference data obtained in single fruit instead of pooled samples, as indicated for instance for maize or soybean seed composition (Baye et al., 2006;Lee et al., 2010), provided accurate reference methods for single fruit analysis are available. In olive, Cayuela et al. (2009) obtained poor results for the prediction of oil content in individual fruits probably due to the low accuracy of the hexane:isopropanol extraction reference analysis.
After the first evaluation step in initial progenies populations, the selected genotypes are propagated for further evaluation in comparative field trials. The results obtained in this work suggest again that NIR can be used as a decision support tool at this step of the breeding process. Significant differences for oil and moisture content were obtained between genotypes with similar grouping for both reference and NIR predicted results (Table 4). Moreover, the analysis of variance between and within genotypes provided similar distribution of sums of squares for reference and predicted data, which indicate that NIR predicted results could be efficiently used for genotype × environment and heritability studies as suggested in other crops (Welle et al., 2005;Posada et al., 2009). Models developed from samples collected in selections plots provided slightly more accurate results than the obtained in samples collected in progenies plots. These differences could be in part attributed to differences in the samples populations themselves although a wide variability for the evaluated characters was available in both cases. Probably the main difference could be attributed to the different conditions followed for acquisition of spectral data. Spectral data of samples collected in selections plots were obtained under constant laboratory room conditions while on-tree fruit spectral data were used in samples collected in progenies plots. Higher interference of the environment such as ambient light and fluctuating temperatures must be expected under on-field conditions (Nicolaï et al., 2007). Several works comparing NIR model performance under laboratory vs. on-field conditions for other foodstuffs reached similar conclusions with regard to the accuracy of the models (Kusumiyati et al., 2008;Perez-Marin et al., 2009). In any case, the advantages of on-field application avoiding the need for sample manipulation or transportation to the lab could be particularly interesting in breeding programs, in which large number of samples from different trials in different locations must be tested every year.
In conclusion, the results obtained indicate that portable NIR spectroscopy could be used for evaluating oil content and moisture in different selection steps of olive breeding programs, providing accurate enough results for selection of genotypes. This selection can be carried out directly in intact olive fruits on-tree, without interfering fruit development, avoiding the need for sample transportation to the lab and allowing repeated measurements across the ripening season. Different letters indicate significant differences at p < 0.05. Ranking order in brackets. Selection are presented in ascending order for reference oil content.
Olive breeding programs can take benefit of all these advantages to improve the efficiency of the evaluation process. These results could also be useful in other research works where these fruit characters should be evaluated, provided that predictive errors are low enough for their application. Further work will be necessary to develop calibration including data covering an appropriate range of instrumental, biological and environmental conditions to build more robust models applicable independently of external factors. Calibrations for additional constituents could be also added in future works for simultaneous selection of genotypes for oil quality components.