Potential of VIS/NIR spectroscopy to detect and predict bitter pit in ‘Golden Smoothee’ apples

Aim of study: A portable VIS/NIR spectrometer and chemometric techniques were combined to identify bitter pit (BP) in Golden apples. Area of study: Worldwide Material and methods: Three different classification algorithms – linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and support-vector machine (SVM) –were used in two experiments. In experiment #1, VIS/NIR measurements were carried out at postharvest on apples previously classified according to 3 classes (class 1: non-BP; class 2: slight symptoms; class 3: severe symptoms). In experiment #2, VIS/NIR measurements were carried out on healthy apples collected before harvest to determinate the capacity of the classification algorithms for detecting BP prior to the appearance of symptoms. Main results: In the experiement #1, VIS/NIR spectroscopy showed great potential in pitted apples detection with visibly symptoms (accuracies of 75–81%). The linear classifier LDA performed better than the multivariate non-linear QDA and SVM classifiers in discriminating between healthy and bitter pitted apples. In the experiment #2, the accuracy to predict bitter pit prior to the appearance of visible symptoms decreased to 44–57%. Research highlights: The identification of apples with bitter pit through VIS/NIR spectroscopy may be due to chlorophyll degradation and/or changes in intercellular water in fruit tissue. experiments: ET and SA. Performed the experiments, analyzed the data and wrote the paper: ET. Supervised the work and reviewed the paper: SA and IR. All authors read and approved the final manuscript. Torres, Alegre, S (2021). Potential of VIS/NIR spectroscopy to detect and predict bitter pit in ‘Golden Smoothee’ apples. Spanish


Introduction
Bitter pit is considered one of the key physiological disorders in apple crop, causing serious losses in certain apple varieties. The symptoms are characterized by depressions on the skin, generally located at the calyx end of the fruit. The tissue under these depressed areas is darkened, dry and corky with a bitter taste (Ferguson & Watkins, 1989;Jemrić et al., 2016). Traditionally, research studies have related bitter bit with calcium deficiency and the balance between calcium and other nutrients such as nitrogen, potassium, or magnesium (Ferguson & Watkins, 1989;Amarante et al., 2013;de Freitas et al., 2015). However, some works have shown a lack of relationship between calcium content level and bitter pit, indicating that the disorder could be affected by other causes such as climate and/or growing conditions (Saure, 1996(Saure, , 2005Lotze et al., 2008;Saure, 2014;Torres et al., 2017b).
Although the process that causes bitter pit usually starts during the period of fruit growth, the symptoms may not be evident in the orchard and generally appear during fruit storage or transport. This can result in extensive losses associated with labor and packing costs (Val et al., 2010;Kafle et al., 2016). Identifying fruit prone to bitter pit before export or shipment would help to reduce economic losses caused by market rejection (Nicolaï et al., 2006;Torres et al., 2015). Mineral analyses based on calcium levels within the fruit or approaches which force the appearance of symptoms before they naturally occur have been studied by different researchers to predict bitter pit (Torres et al., 2010(Torres et al., , 2015(Torres et al., , 2021Jemrić et al., 2016;Kalcsits, 2016). The drawbacks of these techniques include the fact they are destructive, which means the same fruit cannot be monitored over time, and a delay of between 5 and 14 d before obtaining the results.
Measurements of spectral reflectance in the visible and near-infrared regions (VIS/NIR) using spectroscopy technology has several advantages (rapid measurement, repeatability, chemical-free and ability to measure multiple attributes simultaneously) over destructive methods (i.e. mineral analysis) which could help to overcome these difficulties. The VIS/NIR radiations penetrate the object, and changes in the spectral characteristics due to scattering and absorption depend on its chemical composition. These spectral changes provide information about microstructural properties of the object, including stiffness and internal damage. Many studies have explored the possibility of using VIS/NIR spectroscopy for studying quality and disorders or for detecting defects in apples (Upchurch et al., 1990;Mehl et al., 2002;Ariana et al., 2006;Xing et al., 2006;Paz et al., 2009).
As for bitter pit detection and/or prediction, very few studies have been published. Lotze (2005) made a classification between healthy and bitter pitted 'Braeburn' apples using fluorescence imaging with an accuracy of 75-100%. In another study, Nicolaï et al. (2006) used a line scan near-infrared camera with a spectrograph to capture spectral images and successfully identify visible and non-visible bitter pit lesions. Similarly, Ariana et al. (2006) presented an imaging model of reflectance and fluorescence to differentiate between pitted and non-pitted apples with accuracies of up to 87%. Recently, Kafle et al. (2016) and Jarolmasjed et al. (2017) distinguished bitter pitted apples by means of NIR spectrometry with an average accuracy in the range of 70-100%.
Most of the above-mentioned studies used laboratory equipment in the NIR region (780-2500 nm) and are of little practical use in the field or packing houses. In recent years, as the result of rapid technological developments, handheld VIS/NIR spectrometers have been designed to specifically control and monitor the quality and maturity of different fruits (León-Moreno, 2012). These spectrometers are portable and compact, but usually offer measurements only in limited wavelength ranges (<1100 nm). According previous studies, these wavelength ranges could be enough to detect some disorders in apples. ElMasry et al. (2008) developed a hyperspectral imaging system based on a spectral region between 400 and 1000 nm for early detection of bruises on apples. Kleynen et al. (2005) observed that the 750 and 800 nm bands offe-red good contrast for detecting internal tissue damage like hail damage, bruises, and so forth.
The present study aims to evaluate the VIS/NIR spectroscopy technology for detecting and predicting bitter pit in apples. The specific goals of the study were: (1) to determine the capacity to classify different severity levels of bitter pit; (2) to determinate the capacity to classify healthy and bitter pitted apples carrying out one or two measures per fruit; and (3) to determine the capacity to predict bitter pit prior to the appearance of symptoms. To our knowledge, no results related to these specific objectives have been published to date.

Plant material and selection of samples
'Golden Smoothee' apples (Malus domestica Borkh. L) were harvested on September 10 th , 2015, and September 13 th , 2016, from a bitter-pit-prone orchard located at Gimenells (Lleida, NE Spain). The orchard was planted in 1994 and trees were grafted onto M9 rootstock. Plant and row spacing were 1.4 m and 4 m, respectively (1786 trees/ha). Fertigation was applied through drip irrigation. The soil was characterized as a calcareous loam with excellent drainage characteristics. Trees were managed according to the guidelines for apple integrated production, including the application of mineral fertilizers that were estimated to cover the nutrient requirements.

Sampling of apples with visible bitter pit symptoms (experiment #1)
In 2015, average-sized apples (80-85 mm of diameter) at commercial harvest (September 2 nd ) were collected and placed in packaged fruit boxes and stored in cold storage at 0 °C for 4 months. Bitter pit was then evaluated using a category scale with 3 classes of bitter pit depending on the amount of pitted area ( Fig. 1): class 1 with no bitter pit symptoms; class 2 with slight symptoms (fruit having 1-5 pits on the surface); class 3 with moderate/severe symptoms (more than 5 pits per fruit). Two 276-apple samples were selected. Each sample comprised 100 apples of class 1, 76 apples of class 2 and 100 apples of class 3 ( Table  1). Two VIS/NIR measurements were carried out on each fruit (see section Data collection).

Sampling of apples when bitter pit symptoms were not visible (experiment #2)
In 2016, healthy apples were collected at 20 d before harvest (at preharvest) and at commercial harvest 3 VIS/NIR to detect bitter pit (at harvest, September 8 th ). One average-sized (70-75 mm and 75-80 mm of diameter, repectively) and undamaged apple was taken from different trees with standard crop loads and vigor at a height of 130-170 cm above the ground. The apples collected were subsequently placed on plastic fruit trays. Then, VIS/NIR measurements were carried out on each fruit when no symptom was visible (see section Data collection).
After the VIS/NIR measurements, the apples collected at preharvest were left at room temperature (~ 25 °C) to allow the development of bitter pit symptoms according to the passive method (Torres et al., 2015). After 7 d, bitter pit was evaluated using a binary-class classification: class 1 and class 3 (apples classified as class 2 were discarded for this experiment). Then, two balanced 40-apple samples were selected for bitter pit prediction at preharvest.
The apples collected at commercial harvest were kept, after the VIS/NIR measurements, in cold storage at 0 °C for 4 months; after this, bitter pit was evaluated using a binary-class classification to separate apples without visual symptoms (class 1) from apples with visible BP symptoms (class 3). Two 40-apple samples were selected for bitter pit prediction at harvest; each sample comprised 20 apples of class 1 (asymptomatic apples) and 20 apples of class 3 (symptomatic apples) (Table 1). Two VIS/NIR measurements were carried out on each fruit (see section Data collection).

Data collection
The spectral absorbance data were collected under laboratory conditions using a UT-5001 portable handheld Figure 1. Category scale with three classes of bitter pit depending on the amount of pitted area: class 1 with no bitter pit symptoms; class 2 with slight symptoms (fruit having 1-6 pits on the surface); class 3 with moderate/severe symptoms (more than 6 pits per fruit). Class 1: healthy apples without symptoms; class 2: bitter pit apples with slight symptoms (1-6 pits); class 3: Bitter pit apples with severe symptoms (> 7 pits). 2 Two spectral data collected per fruit from two opposite sides. 3 DBH: days before harvest VIS/NIR spectrometer (UT instruments, Lugo, Ravenna, Italy) with measurement range of 650-950 nm, a range focuses on the red-border and near-infrared region and with high physiological factors such as changes in pigments, water, carbohydrates, chlorophyll content and fluorescence (Kurenda et al., 2014). The spectral resolution of the spectrometer was 2 nm. Fruits were equilibrated at room temperature approximately half a day before spectral acquisitions. A blank scan was performed using Teflon® before starting the spectral measurements of each sample. The VIS/NIR measurements were carried out from two opposite sides along the equator of the fruit.
In experiment #1, the first measurement was carried out on the most affected side when symptoms were visible, but on healthy area close to symptoms (symptomatic side); the second measurement was carried out on healthy area from the opposite side (non-symptomatic side).
For each experiment, two datasets were obtained from the measurement of each sample. The two datasets were combined to develop a new set of data. Finally, the spectral absorbance data of each experiment were analyzed as three datasets (Table 1).

Experiment #1: identification of bitter pitted apples
Each dataset was analyzed to evaluate the suitability of the spectral feature extraction in discriminating the identified class of bitter pit using a multiclass classification (class 1: non-BP; class 2: slight symptoms; class 3: severe symptoms) or a binary-class classification (classes 1 and 3). Differences between classification accuracies using measurements from different fruit sides (symptomatic, non-symptomatic and both fruit sides) were also analyzed. Each dataset was separated into a balanced training and testing dataset. The ratio of each dataset was 7:3 for model development and validation, respectively. The spectral feature extractions were evaluated using the algorithms LDA (linear discriminant analysis), QDA (quadratic discriminant analysis) and SVM (support-vector machine). The software used was The Unscrambler® (version 10.4; Camo Process AS, Oslo, Norway). The datasets were randomized three times each for evaluation of classifier performance.

Experiment #2: prediction of bitter-pit-prone apples
Each dataset was analyzed to evaluate the suitability of the spectral feature extraction in predicting the appearance of bitter pit, at preharvest and harvest, before the appearance of symptoms on the fruit surface. For this experiment, a binary classification was used (apples of class 1 and class 3). The VIS/NIR measurements were carried out from two opposite sides along the equator of the fruit. The spectral feature extractions were evaluated using three classification algorithms (LDA, QDA and SVM) for each case. Each dataset was separated into a balanced training and testing dataset as described above (training-testing ratio of 7:3).

Statistical analysis
Analyses were performed in SAS 9.2 (SAS Institute Inc., 2009). For experiment #1, a three-way interaction from binary logistic regression analysis was performed to test the main effects of 'algorithm' (LDA, QDA and SVM classifier models), 'bitter pit class' (1, 2, 3, overall or 1, 3, overall) and 'fruit side' (symptomatic, non-symptomatic and both fruit sides), and their interactions, on classification accuracies (percentage of correctly classified apples). For experiment #2, a two-way from binary logistic regression analysis was performed, for the pre-and harvest data, to test the main effects of 'algorithm' and 'bitter pit class' (1, 3, overall), and their interaction, on prediction accuracies.

Multiclass classification accuracies
No significant differences were found between multiclass classification accuracies from different measured fruit sides (symptomatic, non-asymptomatic and both fruit sides). Nevertheless, the Chi-square test indicated a significant 'algorithm × class' interaction effect on classification accuracies ( Table 2). Because of this significant interaction, algorithms were compared in each individual class (class 1, class 2 and class 3) and 'overall'.
The average individual (class 1, 2 and 3) and overall multiclass classification accuracies, using the different classifier models (LDA, QDA and SVM), are shown in Fig. 2. The average overall classification accuracy was significantly higher using the LDA classifier than the QDA and SVM classifiers (50% vs. 42%). No significant differences between algorithms were found for class 1 (mean accuracy of 51%). The three classifier models (LDA, QDA and SVM) showed the lowest accuracies in class 2; LDA yielded an average class 2 classification accuracy significantly higher than QDA and SVM, and QDA obtained an accuracy significantly higher than SVM (37, 22 and 1%, respectively). For class 3, the average classification accuracy was significantly higher for SVM (65%) than LDA and QDA; no significant differences were found between LDA and QDA (52%).

Binary classification
Binary classification showed significantly higher accuracies than multiclass classification, independently of the classifier model. No significant differences were found for the effect of different fruit sides (symptomatic, non-asymptomatic and both fruit sides) on binary classification accuracies ( Table 2). As in the multiclass classification, the Chi-square test indicated a significant 'algorithm × class' interaction effect on classification accuracies. Because of this significant interaction, algorithms were compared in each individual class (healthy-class and bitter pit-class) and overall.
The average individual (class 1 and 3) and overall binary-class classification accuracies, using the different classifier models (LDA, QDA and SVM), are shown in Fig. 2. As in the multiclass classification, the average overall classification accuracy was significantly higher using the LDA classifier than the QDA and SVM classifiers; QDA produced an overall classification accuracy significantly higher than SVM (75%, 65% and 57%, respectively).
The average class 1 (asymptomatic apples) classification accuracy was also significantly higher using LDA than QDA and SVM, and QDA showed a significantly higher accuracy than SVM (81%, 59% and 49%, respectively). The classification accuracy of class 3 (symptomatic apples) using QDA was significantly higher than when using LDA or SVM (71% vs. 67%); no significant differences between LDA and SVM were observed. The accuracy average of class 1 was significantly higher than of class 3 using LDA (81% vs. 68%), whereas using QDA and SVM the accuracies were higher for class 3 than for class 1 (59% vs. 71% and 49% vs. 65%).
These results indicate that the LDA classifier yielded a higher number of false positives (healthy apples identified  Table 2. p-values of Chi-square test from logistic regression analysis to test the main effects on classification accuracies (percentage of correctly classified apples) of 'algorithm' (LDA, QDA and SVM classifier models), 'bitter pit class' (1, 2, 3, overall or 1, 3, overall) and 'fruit side' (symptomatic, non-symptomatic and both fruit sides), and their interactions, using VIS/NIR data when bitter pit symptoms were visible.

Figure 2.
Comparison of average LDA-, QDA-and SVM-based multiclass (above) and binary (below) classification accuracies (percentage of correctly classified apples), for each bitter pit class classification (class 1, class 2, class 3 and overall), using VIS/NIR data when bitter pit symptoms were visible. Error bars indicate the standard error of the mean. Class 1: no bitter pit symptoms; class 2: slight/moderate symptoms (1-6 pits on the surface); class 3: severe symptoms (more than 7 pits per fruit).
as bitter pitted apples), whereas the QDA and SVM classifiers yielded a higher number of false negatives (bitter pitted apples classified as healthy). A visual inspection of the absorbance curves indicated that the greatest differences between pitted (class 3) and non-pitted (class 1) apples were in the VIS region of 650-700 nm and in the NIR region of 900-950 nm (Fig. 4).

Experiment #2
Within each prediction time (preharvest and harvest), no significant differences in classification accuracies were found between the different classifier models or the different classification classes (Table 3). The average individual (class 1 and 3) and overall classification accuracies, using the different classifier models (LDA, QDA and SVM), are shown in Fig. 4, for both prediction times (at preharvest and at harvest, respectively). In both prediction times, the classification accuracies prior to the appearance of bitter pit were lower than when bitter pit was visible, independently of the classifier model. The average overall accuracies were of 44% (LDA), 40% (QDA) and 51% (SVM) at preharvest, and of 57% (LDA), 58% (QDA) and 47% (SVM) at harvest, with no significant difference between the different classifier models (Fig. 4). These results indicate that VIS/NIR spectra were not capable of accurately detecting bitter pit prone fruit.

Discussion
The visual inspection of the absorbance curves indicated that the greatest differences between pitted and non-pitted apples were in the VIS region of 650-700 nm and in the NIR region of 900-950 nm. A fruit spectrum in this wavelength range is strongly affected by the presence of the red light absorbing skin pigment chlorophyll which represent the color characteristics in the fruit (Abbott et al., 2010). Chlorophyll a absorbs light at ~660 nm, while chlorophyll b absorbs ~640 nm. As the fruit mature, the chlorophyll decreases and so does the red-light absorption with the consequential loss of green coloring. The reflected red wavelengths, combined with reflected green wavelengths, is perceived by the human eye as yellow. In this respect, immature fruit are more susceptible to bitter pit than fruit harvested at the proper maturity (Prange et al., 2011). Hence, the skin chlorophyll content could be a reliable measure of bitter pit risk. Changes in this wavelength range are also used as indicative of stress or nutrient deficiencies. Plant stress is typically accompanied by a reduction or shutdown in photosynthesis, the effect of which is a reduction in the absorption (i.e. a higher reflectance) of blue and red wavelengths. In short, changes in the absorption range around 650-700 nm may reflect a combination of reactions of fruit physiology and maturity and hence, it could be interpreted as indicative of bitter pit. The NIR region of 900-950 nm is associated with the water content or dry matter. ElMasry et al. (2008) defined in apples the absorption valleys in the NIR at 840-960 nm as sugar and water absorption bands. Similar findings were reported by Travers et al. (2014) in pears who observed that the NIR region between 900-970 nm was important for dry matter and soluble solids. In our case, the differences observed in this NIR region could be attributed to the fact that in bruised areas, such as bitter pit-like symptoms, water replaces the intercellular air spaces in the plant tissue and, consequently, could cause a decrease in NIR-reflectance of these areas.
Multiclass classification overall accuracies were from 42% (QDA and SVM) to 50% (LDA). Class 2 (slight symptoms, with fruit having 1-5 pits on the surface) showed accuracies significantly lower than for the other two classes and overall accuracies, independently of algorithm (1-37%). The imbalance of the class distribution in the training data (100, 76, 100) could have led to classification algorithms overestimate the majority classes (class 1 and 3). The lack of accuracy could be also attributed to the low degree to which the apples were affected. In order to reduce these errors, we rejected class 2 apples for the binary-class classification.
The binary-class classification resulted in accuracies significantly higher than for multiclass classification. The overall accuracy of the binary-class classification was 75% using the LDA classifier model. These results are similar to those obtained by other researchers for discriminating between healthy and bitter pitted apples using a binary-class classification (Nicolaï et al., 2006;Kafle et al., 2016;Jarolmasjed et al., 2017). The LDA classifier accuracies were significantly higher than those of the QDA and SVM classifiers, and QDA (65%) was found to yield higher accuracies than SVM (57%). Kafle et al. (2016) also reported that QDA had a higher classification accuracy than SVM of healthy and bitter pitted apples, with accuracy values similar to those obtained in the present study. The accuracy level in each class depended on the classifier model used. QDA and SVM showed higher accuracies for class 3 (bitter pitted apples), whereas classification accuracies of LDA were significantly higher for class 1 (asymptomatic apples). This indicates that LDA classifiers yielded a higher number of false positives (healthy apples identified as bitter pitted apples), whereas QDA and SVM classifiers yielded a higher number of false negatives (bitter pitted apples classified as healthy). These results contrast with those obtained in a similar experiment with Honeycrisp apples performed by Kafle et al. (2016), who obtained a significant increase for healthy-apple class accuracies compared to bitter-pitted-apple class using QDA and SVM algorithms. Jarolmasjed et al. (2017) obtained a higher number of false negatives than false positives using partial least square regression. Further studies with larger datasets need to be performed to validate these aspects because algorithm performance arguably depends on several factors which might be respon-sible for the different results (Sankaran & Ehsani, 2011;Kafle et al., 2016).
It was observed that classification accuracies did not change significantly when measuring different fruit sides (symptomatic, asymptomatic or both). VIS/NIR spectroscopy point meter readings measure a limited area (10 mm diameter circle) per measurement. This aspect may limit the application of NIR technology to detect affected apples due to the inability to take measurements on affected areas (Lotze, 2005). However, the signature of organic and complex compounds of major chemical contents associated with bitter pit (calcium, magnesium, and potassium) might exist in spectral data obtainable by VIS/NIR spectroscopy (Jarolmasjed et al., 2017). Hence, it would be possible to detect -even prior to the appearance of visible symptoms -bitter pitted apples based on the chemical contents of the fruit, independently of whether the measurement is made on the bitter pit lesion. Our results indicate that the VIS/NIR-spectroscopy method offers potential for non-destructive discrimination of bitter pitted from healthy apples, independently of measured fruit area. This would facilitate the development of a model for implementation in portable handheld VIS/NIR spectrometers or in automatic apple sorting systems.
VIS/NIR spectroscopy readings were however unable to identify immature (preharvest) or mature apples (harvest) more prone to bitter pit development prior to the appearance of visible symptoms. Lotze (2005) proposed that the inability of VIS/NIR spectroscopy point meter readings to identify bitter pit prone fruit could be due to not being able to take measurements on the affected areas. However, we did not observe this limitation in experiment #1 when the bitter pit symptoms were visible. According our results, the identification of apples with visible bitter pit through VIS/NIR spectroscopy may be due to chlorophyll degradation and/or changes in intercellular water in fruit tissue and, from our point of view, these changes would not have developed before the appearance of symptoms. We suggest that bitter pit prone apples could be detected through the relationship of mineral content associated with bitter pit (calcium, magnesium, and potassium) but, according recent studies, a higher wavelength range will be necessary for these cases. Bonomelli et al. (2020) did not observe correlations in apple tissues between calcium content and reflectance spectrum between 285-1200 nm. On the other hand, Galvez-Sola et al. (2015) and Jarolmasjed et al. (2017) obtained a strong relationship between the spectral features and the calcium content in apples and citrus leaves, respectively, using wavelength ranges from 830-2600 nm. Along this line, a future work using a higher wavelength range (>900 nm) will investigate a possible relationship between NIR spectral features and mineral content at an early stage of fruit growth to predict bitter pit potential. Further work will be necessary to develop calibration including data covering  Figure 4. Comparison of average LDA-, QDA-and SVM-based binary classification accuracies (percentage of correctly classified apples), for each bitter pit class classification (class 1, class 3 and overall), using VIS/NIR data on healthy apples at 20 days before harvest (DBH) and bitter pit symptoms assessed after 10 d at room temperature, and at harvest on healthy apples and bitter pit symptoms assessed after 4 months in cold storage at 0 ºC. Error bars indicate the standard error of the mean.

Preharvest Postharvest
Algorithm NS NS

NS NS
Algorithm × Classification NS NS Table 3. p-values of Chi-square test from logistic regression analysis to test the main effects on classification accuracies (percentage of correctly classified apples) of 'algorithm' (LDA, QDA and SVM classifier models) and 'bitter pit class' (1, 3, overall), and their interaction, using VIS/NIR data when bitter pit symptoms were not visible. 'Preharvest': NIR measured at 20 d before harvest on healthy apples and bitter pit symptoms assessed after 7-10 d at room temperature (22-25 ºC). 'Postharvest': NIR measured at harvest on healthy apples and bitter pit symptoms assessed after 4 months in cold storage (0 ºC).
an appropriate range of instrumental, biological and environmental conditions to build more robust models applicable independently of external factors.