Multivariate diagnosis of the variability of NIR spectrometers under industrial applications

The transfer of NIR spectroscopy to industry relies on the possibility of real time identification of abnormal spectra as well as uncontrolled sources of variation. This study proposes an unsupervised procedure for the identification under an industrial application of daily events (general changes) and abnormal observations. It consists in defining a spectral database at the beginning of a season, performing a principal component (PC) analysis, and calculating the PC scores over time. Process control statistics (Hotelling T, Q) are used for multivariate supervision of the industrial application. Within this procedure 10,400 average spectra of onion bulbs were evaluated identifying events in 12 out of 66 work dates, as well as spectral trends throughout the season of 2002.


Introduction
The viability of Near Infra Red (NIR) Spectrometry for internal quality assessment in fruit and vegetables is accepted world wide even for real-time applications.However, the transfer of technology to the agroindustry is still a challenge due to a high number of uncontrolled sources of variation which modify the spectral information, and reduce the accuracy of estimations.Some of these sources of variation are: the internal temperature of the product and the spectrometer (Hernández-Sánchez et al., 2003), the skin thickness (Krivoshiev et al., 2000), and the presence of boundary layers and voids inside the product (Fraser et al., 2003).
In the calibration of models, a difference should be stated between under and over fitted situations.The first ones do not take into account relevant information and thus lead to biased estimations, while the latter use non relevant (noisy) information leading to unrobust models when used under external validation (with new observations not used to calibrate the model) (Wortel et al., 2001).In many applications there is a limitation in the transfer of predictive models through a season and also between years due to a poor validation performance, though these models may be extremely useful for classifying a reduced number of categories (Guthrie et al., 1998).
Previous studies concerning the marketing process of high technology devices indicate that between 40 and 80% of new High-Tech products fail in the agro food industry.This rate is higher than that reported for other industrial sectors, and the poor product performance of these equipments is a main cause of failure (Karakaya and Kobu, 1994).
The 56 th Technical Committee from the International Electrochemical Commission gives some crucial definitions for instrumented systems such as fault and failure (Mellor, 2001).Fault refers to something that is wrong with a system which normally leads to an event, that is, an occurrence or happening usually significant to the performance of a function.Failure is an event in time at which an item ceases to perform a required function.Arising from the Safety Instrument Systems, a new concept has been def ined entitled Common Cause Fault (CCF).This term refers to a fault which causes the failure of multiple devices or processes.One of the main CCF is the loss of calibration of sensors, which in a wide sense may be due to drifts of the signal over long periods of time, mechanical and/or human faults (Summers and Raney, 1999).When the consequences of faults are delayed in time, these may be extremely difficult to identify before a failure occurs.
In this study we analyse the events occurred within the industrial use of a NIR spectrometer for the classification of onions according to their soluble solid content.This classification is performed within a genetic selection program.Selected onions have a high soluble solid content in order to optimise the dehydration procedure.
The objective of this study was to perform an unsupervised analysis of data to detect abnormalities in the NIR spectra due to: faults of the NIR spectrometer, specific spectral responses of specimens or changing environmental conditions, in order to avoid incorrect classification of samples.

Material and Methods
A NIR spectrometer (Hamamatsu PMA-11) that incorporates an indium, gallium and arsenic (InGaAs) detector array (256 elements) has been used, allowing spectral analysis in the 900 to 1,600 nm range, with a resolution of 2.7 nm.The minimum integration time for a single spectrum is 5 ms.In our case, the integration time was established in 70 ms for the applications with entire product when interactance (internal reflection) mode and spectra are acquired through an optical fiber in direct contact with the sample.This integration time guarantees a maximum signal around 90% of the dynamic range of the A/D card for the white reference, 80% for the samples (onion bulbs).
The spectrometer uses a mechanical shutter which closes the light path to the detector array when the integration time is attained.This device has been shown to be sensitive to a dusty environment such as most agro industrial factories, as a consequence of dust accumulation it can be stacked, leading to the failure of the equipment.
The plant material used for this study corresponds to 10,400 onion bulbs tested in the 2002 season (August to December) in the AGROTECNICA EXTREMEÑA S.L. facilities (Badajoz).This material was classified according to the average spectral data (three replicates per bulb in a selected area) into three categories of internal quality (low medium and high soluble solid content), by means of an estimation model developed in the previous season (2001).The estimation model (R 2 = 0.75) was adjusted with four sets of data (1,316 bulbs in total), following an iterative process which combines: stepwise multilinear regression for the different sets of data, comparison of selected variables, identification of frequently selected wavelengths and f inal readjustment of the model for the selected wavelengths.The model uses five wavelengths with scattering correction (subtraction of reflectance at 900 nm).This data pretreatment is important since it eliminates a main source of variation which is not relevant for our quality parameter.Therefore six wavelengths are used in total in the model allowing classification into the three mentioned quality categories with misclassification errors below 15% of individuals of the calibration set (Barreiro et al., 2002).
A reduced set of 400 bulbs of the 10,400 bulbs was used in order to assess the predictive capacity of the model.For this validation set, measurements on internal quality were taken with a reference procedure: soluble solids evaluation (measured in Brix degrees) by means of a refractometer (Barreiro et al., 1999).Also the surface (non destructive) and internal (5 mm depth, destructive) temperatures of the bulbs were recorded to trace the effect of this source of variation.Measurements were performed with a PT100 probe.
One operator was trained in the use of the NIR spectrometer by members of the Physical Properties Laboratory (LPF).The LPF was in charge of supervising the functioning and maintenance of the equipment and detected the following events (fault and failure): a) Malfunction of the mechanical shutter due to the aggression of the industrial environment This failure occurred the 4 th of October 2002.The equipment was then transported to the LPF, repaired and installed back in the company on 12 th October 2002.
b) Misalignment of the optical fibers.This fault occurs when the light slit of the optical fiber is not perfectly aligned with the monochromator; this situation was detected the day after reinstalling the equipment.
c) Excessive temperature fluctuations of the bulbs.In order to reduce this, the bulbs were stored near the NIR equipment 24 h before they were tested with the NIR spectrometer to avoid temperature differences between bulbs.Facilities lacked temperature control so the effect of temperature evolution during the season was not avoided.d) Specific characteristics in size and structure of the bulbs.Despite all the bulbs belonging to the same cultivar which is under a genetic selection procedure, bulbs with abnormal shapes, ribbed bulbs or loose layers can produce relevant differences in the scattering of light inside the bulb tissue.
The analytical procedure followed by the LPF for the data recorded with this industrial application included the following steps (Fig. 1): 1. Validation of the estimation model with a set of 400 bulbs.This step requires the measurements of soluble solids (ºBrix) with a refractometer as well as other reference measurements such as temperature of the bulbs (ºC), or the maximum signal (intensity counts) of the NIR reference (barium sulphate disk).
2. Definition of a spectral database (342 average spectra corresponding to bulbs tested at the beginning of the season, August) and performance of a Principal Components Analysis (PCA) with spectral variables centred and scaled to unit variance to avoid the effect of magnitude.Def inition of the spectral database consisted of choosing several test dates where the LPF expert was present and no fault or failure had been recorded.From hereon this dataset is referred to as reference spectral database or reference database.
3. Projection of the validation set (400 average spectra) onto the PC space obtained from the reference spectral database, and study of correlations between the principal components (PC) scores and measured sources of variation (soluble solids, bulb temperatures, maximum signal from the spectral reference).Null correlation between PC scores is expected unless shifts in the spectra occur simultaneously affecting their spectral variables.
4. Projection of the remaining 9,658 average spectra onto the PC space obtained from the reference database and identification of abnormal spectra and changes in behaviour of the NIR spectrometer.Abnormal individuals are addressed by means of normal probability plots.
5. Calculation of process control statistics (Hotelling T 2 and Q) and plotting of process control charts with regard to the Upper Control Limits (UCLs).
6. Averaging the PC scores per day and calculation of the cumulative sum in order to summarize the change in equipment performance.
The Hotelling T 2 is a multivariate statistic which may be used as an event indicator.It is computed daily as stated in Eq. [1], where [X] is the vector containing the average PC scores per day, [m] is the vector containing the average expected values, and [S] is the covariance matrix of the PC scores in a day which is expected to be the identity matrix.
Eq. [2].shows the computation procedure for the UCL of the Hotelling T 2 statistic, where n is the number of observations tested per day, p the number of PC scores considered and F the critical value for a Fisher distribution with α confidence and p, n-p degrees of freedom. [2] The Q statistic is developed to address atypical observations and is an indicator of how well each spectrum f its the PC model.It is computed as the normalized squared error between the original and the predicted spectra when using the PC scores.Under normal conditions the Q statistic has a multi-normal distribution and associated UCL may be estimated using a weighted chi-squared distribution (Simoglou et al., 2000).
Table 1 refers to the number of specimens and spectra obtained in the 2002 season.In order to perform all mentioned steps, devoted Matlab programs (Mathworks Inc.) were developed.

Results
A PCA on the reference database (342 average raw spectra) was performed using the spectral variables (in our case 256 wavelengths) centred and scaled to unit variance.A variance of 99.82% from the original 256 spectral variables was explained by means of f ive Principal Components (PC): 91.7% with PC1, 7.28% with PC2, 0.76% with PC3, 0.13% with PC4 and 0.08% with PC5.The first PC included all wavelengths since all 256 showed correlation coefficients above 0.8 with this factor.Correlation of the different wavelengths with the remaining PCs was always below 0.3.
Since the PCs are linearly correlated with the spectra variables, a normal behaviour of the latter leads to normal distributions of the PCs. Figure 2 shows the normal probability plots for the 10,400 average spectra considering PC1 and PC2 scores.Abnormal individuals are those which clearly separate from the expected normal behaviour.This segregation is clear for PC scores above and below 2 units; note that the mean value for both PC scores is 0.   Figure 3 shows as crosses (+) and circles (o) the PC scores of the reference spectral database (342 bulbs) and specific test data (396) respectively.The top-left plot represents the PC1-PC2 plane while the top-right plot presents the PC1-PC3 plane.Most of the PC scores stay within the [-2,+2] interval.Isolated individuals can be found out of this range specially using the PC2 score.The test date corresponded to the 14 th October 2002.This date showed a significant spectral shift due to incorrect reinstallation of the equipment after repair of the mechanical shutter, which was identified and corrected.
Since the representation of the PC scores produced interesting patterns, it was decided to perform on the validation set (400 bulbs) a correlation analysis using the PC scores together with reference variables: white level of the NIR reference (intensity counts), surface and internal temperature of the bulbs (ºC), observed and estimated soluble solids content (ºBrix).A very high correlation was found between the surface and internal temperature of the bulbs and PC5 (0.82 and 0.8 respectively), which was lower for PC1 (0.7).The white level (intensity counts) corresponding to the barium sulphate disk used for daily calibration of the spectrometer was not related to any of the sources of variation, revealing that this source of variation was manually controlled by the operators of the NIR equipment.One important feature emerging from this table was a high correlation between some of the PC scores (i.e.-0.82 between PC2 and PC3) for the validation set.This suggests the occurrence of new sources of variation that simultaneously affect the PC scores, and  their spectral variables, and which were not acting for the reference database.Also, in Table 2 the correlation between the internal quality parameter (soluble solids content) is explored in relation to the PCs and to the estimated soluble solid content using the model developed for the 2001 season.There was no statistically signif icant correlation between the observed soluble solid content and the PCs, while there was a significant correlation with the estimated soluble solid content according to the prediction model of 2001.This value (R = 0.63) is low when compared to that of the calibration set of the model (R = 0.86).Still only 2% of the individuals corresponding to low quality were erroneously classified as high class, and none of the high class bulb were rejected as belonging to a low class (data not shown).Although none of the PC scores showed a relevant correlation with the observed soluble solid content, the procedure used for model adjustment enables the relevant information concerning estimation of the quality parameter, soluble solid content, to be extracted.
Figure 4 shows the Hotelling T 2 distance computed for the combination of PC scores (1 to 5) along the 66 work dates in the industry.The upper control limit (UCL) for two different confidence intervals (90 and 95%) were also plotted considering an average of 150 individuals per date and events were identified for 12 out of 66 test dates.The highest abnormality was found for the 14 th October 2002.Data for this date are also shown in Figure 3 as circles ( ࠗ).
Figure 5 represents the Q statistic for all 10,400 average spectra of bulbs.The UCLs for 90% and 80% confidence intervals are plotted to identify abnormality of isolated individuals.This procedure can be easily implemented for real time applications.
The evolution of the daily averages of isolated PC scores over time has also been studied.Figure 6 shows the cumulative sum of the daily averages for PC5 scores throughout the season.A clear trend for PC5 towards lower daily values was observed from July to December.This PC factor had been previously related to the surface and internal temperature of the bulbs, as  stated in Table 2.No clear time dependent trends were found for the remaining PC scores.

Discussion
The strategy for the use of estimation models presented in this work exploits the concept expressed by Wortel et al. (2001) as a first step in the transfer of technology of the NIR technique to industry.We developed a not overfitted model in 2001 which had been tested with several calibration sets, and used it in the industry to classify into a reduced number of quality categories.To achieve accurate predictive models, wide representative calibration samples can be taken in industry.The amount of product processed in the industry per day allows a straight forward generation of wide spectral databases and preclassif ication of individuals.The results of even a coarse classification enables the selection of representative data sets to validate the model.Wortel et al. (2001) proposed the contamination of calibration data sets with noise in order to test the model robustness but with the current method new validation datasets can be obtained by daily application in the industry.Schaare and Fraser (2002) evaluated, in the estimation of the internal quality of kiwifruits, several procedures to pre-process the spectra and even to select the most suitable mathematical parameters to obtain a predictive model.They concluded that variables related with the second derivative of the spectra are the best of all the possible transformations of spectral variables.From our point of view, this decision is risky in the estimation of some quality parameters, like soluble solids, since the derivation procedure leads to a high increase in the noise to signal ratio of data and may complicate further model transfer to the industry.Still the use of derivation may not be totally inappropriate when searching for main sources of variation in spectra like water (Ortiz et al., 2001).
The lack of robustness in NIR models is well known in spectrometry.Guthrie et al. (1998), working with pineapples and melons, already stated the unfeasibility of using NIR models for quantitative estimations even within the same season though they proposed their use for classif ication purposes.This is also one of the conclusions of our study.Fraser et al. (2003) emphasize the fact that internal boundary layers in the products are critical for transmittance measurements but not so much for other types of sample presentations such as reflectance or interactance.This point is also very relevant for onion bulbs since the internal layers lose tightness during the post harvest period, and will have to be taken into account when trying to shift from interactance to transmittance in order to facilitate online measurements.
A main result of this study is that the periodical generation of spectral databases under industrial applications, together with simple multivariate data analysis, allows a continuous unsupervised extraction 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 of NIR features.The application of world wide accepted multivariate techniques like PCA combined with process control statistics (Hotelling T 2 ) enables events to be identified in the daily process using PC scores.
The presence of abnormal spectral responses of isolated bulbs can be highlighted as outlier dots in a normal probability plot though the main drawback concerning normal probability plots is that they may only be computed off-line when a whole population of data has been gathered.Instead, the use of the Q statistic commonly used for process control may easily be implemented in real time analysis for addressing spectra which are not well reproduced within the PC space.
Another important result or outcome of this work is the interest of studying the evolution in time of the PC scores of individuals.In this work a very close relationship is found between PC5 (0.08% of the spectral variance) and advance of the season (from July to December, 66 work dates).This PC factor (PC5) was also found to be highly correlated with the temperature of the onion bulbs in the validation set (r = 0.8).The effect of the temperature of the product on the NIR spectra is known.Hernández-Sánchez et al. (2003) estimated the bias in soluble solid prediction in apple due to a 30ºC temperature variation around 2ºBrix.This temperature variation is excessive when considering harvesting conditions, and even extreme when comparing fruits from cold storage and ambient conditions.In the case of onion bulbs over the postharvest season we expect a 10ºC variation range when no temperature control is used.Bulb temperature may then be integrated as an independent variable in the prediction models.Another interesting possibility to be explored is the use of specific PC information like PC5 for the internal correction of temperature in the spectra.A general conclusion of this work is that it is possible to address abnormalities in the NIR spectra emerging from non supervised analysis of industrial databases, which can then be related to new sources of variation.

Figure 1 .
Figure1.Scheme of the data analysis procedure followed in this study.PCA stands for Principal Component Analysis and MLR for multiple linear regression.

Figure 2 .
Figure 2. Normal probability plots for PC1 and PC2 scores (10,400 data).Abnormal values are those which clearly segregate from expected normal ones (line).

Figure 3 .
Figure 3. Example of result obtained under industrial use: crosses (+) represent PC score for the reference spectral database (342 bulbs) while circles (ࠗ) indicate the PC scores obtained for individuals tested on 14 th October 2001 (396 bulbs).Bottom-left and bottom-right plot refer to the raw average spectra obtained for individuals from the original database and from 14 th October 2001 respectively.

Figure 4 .
Figure 4. Hotelling T2 computed for the combination of the five PC considered (1 to 5).Events are found for 12 out of 66 test dates in the industry.The highest deviation is found for the 14 th October 2002.This date also corresponds to the data shown in Figure3as circles (ࠗ).UCL stands for Upper Control Limit which has been computed for 95% and 90% confidence levels, and an average of 150 individuals tested per date.

Figure 5 .Figure 6 .
Figure5.Q statistic computed for the combination of the five PC considered (1 to 5) for all 10,400 bulbs tested.UCL stands for Upper Control Limit which has been computed for 90% and 80% confidence levels.Values above UCL refer to significant mismatch between the original and predicted spectra within the PC model.

Table 1 .
Summary of samples used for NIR and reference analysis: soluble solids (°Brix), internal and surface bulb temperature (°C)

Table 2 .
Correlation matrix for the validation set (n = 400)Variables included are: PC scores, external and internal temperature of the bulbs, the daily white level signal of the spectrometer, and the observed and estimated values of soluble solids (ºBrix).Independence of variables is found for correlation values below 0.13.Bold numbers enhance correlations above 0.6.