Using NDVI and guided sampling to develop yield prediction maps of processing tomato crop

The use of yield prediction maps is an important tool for the delineation of within-field management zones. Vegetation indices based on crop reflectance are of potential use in the attainment of this objective. There are different types of vegetation indices based on crop reflectance, the most commonly used of which is the NDVI (normalized difference vegetation index). NDVI values are reported to have good correlation with several vegetation parameters including the ability to predict yield. The field research was conducted in two commercial farms of processing tomato crop, Cantillana and Enviciados. An NDVI prediction map developed through ordinary kriging technique was used for guided sampling of processing tomato yield. Yield was studied and related with NDVI, and finally a prediction map of crop yield for the entire plot was generated using two geostatistical methodologies (ordinary and regression kriging). Finally, a comparison was made between the yield obtained at validation points and the yield values according to the prediction maps. The most precise yield maps were obtained with the regression kriging methodology with RRMSE values of 14% and 17% in Cantillana and Enviciados, respectively, using the NDVI as predictor. The coefficient of correlation between NDVI and yield was correlated in the point samples taken in the two locations, with values of 0.71 and 0.67 in Cantillana and Enviciados, respectively. The results suggest that the use of a massive sampling parameter such as NDVI is a good indicator of the distribution of within-field yield variation. Additional key words: Solanum lycopersicum; ordinary kriging; regression kriging; vegetation index; precision agriculture. Abbreviations used: ETc (crop evapotranspiration); ETo (reference crop evapotranspiration); Kc (dual crop coefficient); Kcb (basal crop coefficient); ke (evaporation coefficient) NDVI (normalized difference vegetation index); NIR (near infrared); PCA (principal components analysis); RRMSE (relative root mean square error) Citation: Fortes, R.; Prieto, M. H.; García-Martín, A.; Córdoba, A.; Martínez, L.; Campillo, C. (2015). Using NDVI and guided sampling to develop yield prediction maps of processing tomato crop. Spanish Journal of Agricultural Research, Volume 13, Issue 1, e02-004, 9 pages. http://dx.doi.org/10.5424/sjar/2015131-6532. Received: 14 Jul 2014. Accepted: 10 Feb 2015 http://dx.doi.org/10.5424/sjar/2015131-6532 Copyright © 2015 INIA. This is an open access article distributed under the Creative Commons Attribution License (CC by 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: Project GRU 10130 (Regional Government of Extremadura) and Project A-E-11-0255-4 (co-financed by Roma SL and the Regional Government of Extremadura). Both projects were co-financed through the European Regional Development Fund (FEDER). Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Rafael Fortes: rafaelfortesgallego@outlook.com, or Carlos Campillo: carlos.campillo@ gobex.es (shared corresponding authors)


Introduction
The processing tomato is one of the most important crops in Spain, producing around 1.97 million tonnes.In recent years, the management regime of this crop has undergone a series of changes as a result of an increase in average field size.New tools are consequently required to enable a global view of these larger-sized fields and to determine the heterogeneous 2 are correlated with each other and beyond which they become independent (Koller & Upadhyaya, 2005).The parameters of the best-fitted model for a variogram can be used for kriging (Matheron, 1963;Stein & Corsten, 1991).From the point of view of data analysis, kriging has been recommended as the best method to interpolate point data since it minimizes error variance using a weighted linear combination of the data (Panagopoulos et al., 2006).There are also numerous studies demonstrating the benefits of geostatistical analysis techniques for agricultural management.For example, kriging has been used to map the density of weeds in winter wheat (Heisel et al., 1996) and geostatistical methods have been used to interpolate data and produce maps of a field representing the spatial variability of all the soil and wheat properties (Stewart et al., 2002).Techniques like regression kriging, which involves various combinations of linear regressions and kriging, are useful tools to improve geostatistical analysis in prediction maps.The simplest model is based on a normal regression followed by ordinary kriging with the regression residuals (Odeh et al., 1995), hence the importance of analysing the robustness of the prediction maps using tools like principal components analysis (PCA), a multivariate technique commonly used for the analysis of several variables simultaneously (Balzarini et al., 2011).The aim of the present study is to describe the possibility of merging guided yield point sampling with massive sampling (NDVI) to produce yield prediction maps for the processing tomato crop in a reliable way, using a multivariate analysis technique like regression-kriging.

Study area
The field research was conducted in two farms, "Los Enviciados" (-7.009427 38.950592 decimal degrees) and "Cantillana de Mesas" (-6.942781 38.946368 decimal degrees), with study areas of 6.50 ha and 7 ha, respectively.The farms are situated in the proximity of Badajoz (southwest Spain).The climate of this area is characterized by variation in both temperature and precipitation typical of a Mediterranean climate, with mean annual precipitation of less than 500 mm.One of the most important characteristics of the precipitation is its interannual variability.There is a dry season, from June to September, and a wet season, from October to May (80% of the precipitation falls between these months).Summers are hot, with temperatures sometimes rising above 40ºC.(Gianquinto et al., 2011).Accurate estimation of yield can be used for zonal management of the most productive areas, to plan the best time for harvesting and its transport for industrial processing, and to locate any water and nutritional deficiencies in the crop.Yield monitoring and mapping have given producers a direct method for measuring spatial variability in crop yield.Yield maps have shown high-yielding areas to be as much as 150% higher than low-yielding areas (Kitchen et al., 1999).However, yield maps are confounded by many potential causes of yield variability (Price et al., 1997).When other geo-referenced information is available, producers naturally want to know how these various layers of data can be analysed to help explain yield variability and provide insight into improving production practices (Kitchen et al., 2003).Along with yield mapping, producers have expressed increased interest in characterizing soil and topographic variability.Vegetation indices based on crop reflectance are of potential use in the attainment of these objectives.Developments in the use of sensors have enabled massive geo-referenced data sampling of this parameter and numerous studies have investigated the correlation between vegetation characteristics and remotely sensed canopy reflectance (Xue et al., 2004;Jongschaap, 2006;Gianquinto et al., 2011).There are different types of vegetation indices based on crop reflectance, the most commonly used of which is the NDVI (normalized difference vegetation index).NDVI values are reported to have good correlation with several vegetation parameters.It has been shown that NDVI is a near linear indicator of photosynthetic capacity (Sellers, 1985), while other studies have revealed that it is a good indicator of vegetation, crop biomass and health in agricultural applications (Koller & Upadhyaya, 2005).Gianquinto et al. (2011) studied the ability to predict tomato yield using different vegetation indices based on crop reflectance.The indices that used green and near infrared (NIR) wavelength were the best indicators, with high precision also obtained for small variations in yield.In a previous study, Koller & Upadhyaya (2005) used a vegetation index based on plant greenness which appeared to be reliable for evaluation of tomato yield.Field measurements have traditionally been gathered as point data, such as samples from an individual plant, but newly developed devices allow mass data collection to give more weight to yield prediction maps (Fortes et al., 2014).Spatial analysis methods can be used to interpolate measurements to create a continuous surface map or to describe its spatial pattern.Variograms (also referred to as semivariograms) are a powerful tool in geostatistics which characterize the spatial dependence of data and give the range of spatial correlation within which the values 3 NDVI and guided sampling to develop yield prediction maps

Sampling
The NDVI survey was conducted in August 2013, 12 days before harvesting, with a Crop Circle ACS-470 reflectance sensor (Holland Scientific Inc., Lincoln, NE, USA) held by a tractor at a height of 80 cm above the canopy.The Crop Circle ACS-470 generated reflectance data in the wavebands 670 (red) and 760 (NIR) which were combined to obtain the NDVI fol- , where ρNIR is a reflectance value of the waveband (760 nm) and ρ d Re is the waveband (670 nm).
An ARVATEC monitor with Topcom GB500 GPS and JAVAD GDD base with sub-meter accuracy was used to geo-reference the NDVI measurements.NDVI data at 10 second intervals were recorded on an ACS-470 data logger in an ASCII text format.Later, this raw ASCII file was transferred to other software for further analysis.NDVI measurements were made along different parallel transects approximately 8 m apart (yellow dots with the appearance of lines in Fig. 1) and the final database contained 11,497 and 15,878 values for the Enviciados and Cantillana locations, respectively.Ordinary kriging was used to develop an NDVI prediction map for the two locations (Figs.2A and 2B).These maps were then used to guide yield sampling to cover the different homogeneous zones described by the NDVI prediction maps for use in the methodology (red flags in Fig. 1), while other samples were randomly

Crop characteristics
The field was transplanted with processing tomato (Solanum lycopersicum L., cv.H9661) in April 2013, at a planting density of 33,333 plants/ha in single rows, with a distance between plants of 30 cm and width between beds of 1.5 m.The total water applied was 750 mm to all study surfaces.A drip irrigation tape was used with drippers of 1 L/h each 30 cm.The tested area comprised a single irrigation sector, where the same amount of water was applied in each irrigation.Crop management involved organic practices with no inorganic nitrogen fertilizer input.Organic nitrogen fertilizer (280 units) was applied before transplanting.Crop evapotranspiration (ETc) was calculated on a daily basis using equation ETc = ETo•Kc, where ETo is the reference crop evapotranspiration rate.It was calculated following the Penman-Monteith method, modified and adapted to local conditions (Baselga, 1996).Climate data were obtained from a weather station located near the experimental area.Kc is the dual crop coefficient for processing tomato (Allen et al., 1998), Kc = Ke + Kcb; considering the basal coefficients (Kcb), Kcb_ini (0.20); Kcb_mid (1.11); Kcb_end (0.60), and Ke being the evaporation coefficient.Although the tape was buried (not evaporation), we have considered an evaporation coefficient of 0.1, because surface water was observed during growth and we measured the width of the wet part.A flow meter was installed at different inlet points to measure the real volume of water applied.

4
(Table 2) using SPSS v.13 Windows Package (SPSS, Chicago, IL, USA).When the extent of the relationship between the parameters was known a prediction yield map (Figs.2C and 2D) for each entire plot was developed using ordinary kriging technique as described above.Regression kriging technique was then used to develop a final yield prediction map (Figs.2E and 2F).With this technique, predictions are made separately for the trend and residuals and then added back together.Thus, any parameter at a new unsampled point, x, is estimated, Z*RK(x), using regression kriging as follows: where the trend, m(x), is fitted using linear regression analysis and the residuals, r(x), are estimated using ordinary kriging algorithm.If cj are the coefficients of the estimated trend model, vj(x) is the jth predictor at location x, p is the number of predictors and wi(x) are the weights determined by solving the ordinary kriging system of the regression residuals, r(xi), for the n sample points, then the prediction is made by: In this case study, only one predictor is used, NDVI, so m(x) = a + b•(NDVI(x)).In consequence: The residual at each sampling point, r (xi), is calculated as the difference between the value of the parameter and the estimate by the trend (r The geostatistical analyses were conducted using the Geostatistical and Spatial Analyst extensions of the GIS software ArcGIS (v.10.0, ESRI Inc., Redlands, CA, USA).All maps were produced with the ArcMap module of ArcGIS.
Finally, a comparison was made between the yield obtained at the validation points and the yield values according to the prediction maps (ordinary kriging and regression kriging).In order to assess the quality of the maps in terms of yield prediction we adopted the sta-taken for validation of the methodology (green flags in Fig. 1).The harvested zones were measured and geo-referenced, with the measurements made along different parallel transects, approximately 8 m long.Calculation of production yield was made using the following equation:

Statistical and geostatistical analysis
Firstly, an NDVI prediction map for each entire plot was developed using ordinary kriging technique [see Isaaks and Srivastava, (1989) and Goovaerts, (1997) for a detailed presentation of the kriging algorithms].The prediction maps were developed by the geostatistical interpolation techniques described by Moral et al. (2010) in three phases: (i) An exploratory analysis in which the data were studied without considering their geographical distribution and statistics were applied to check data consistency, removing any outliers and identifying the statistical distribution of the data.(ii) A structural data analysis was developed, in which spatial distribution was evaluated using variograms of the variable (NDVI); the equation of the mathematical model, nugget effect (micro-scale variation or measurement error), sill (variance of the random field) and range (distance at which data are no longer auto-correlated) were used to develop these variograms.(iii) The prediction maps were developed using a data search by neighbourhood copied from the variogram; the NDVI prediction maps were used to guide yield sampling.
A study was then made of the areas from which the samples were taken to determine the relationship between NDVI and yield properties (Table 1).A comparison was made between the NDVI obtained at the sampled points and the yield values measured at those points.A Pearson's correlation matrix and a coefficient of determination between NDVI and yield were then obtained variograms showed a considerable nugget effect (Fig. 3).This was not unexpected since the variability of crop properties can occur at a scale smaller than the minimum lag distance.Table 4 shows the results of theoretical spherical variograms (models that provided the best fit for all cases) fitted to experimental variograms for the residual data.Yield shows the highest nugget-to-sill ratio (~60% in the two locations), suggesting moderate spatial autocorrelation.NDVI had lower nugget-to-sill ratio values (19% in Cantillana and 23% in Enviciados), indicating that spatial dependence was generally strong.The range, that is the maximum distance of spatial dependence, varied from 57 to 163 m, with higher values being obtained for yield than for NDVI.To develop the proposed regression kriging methodology it was necessary to define the best linear relationship for each location (Table 1).After generating the yield prediction maps with both geostatistical analysis models (ordinary and regression kriging), RRMSE was used to measure their resolution.
Table 5 shows this analysis, with the best results obtained using the maps developed by regression kriging, with respective values of 17% and 14% in Cantillana and Enviciados.The values obtained with ordinary kriging of 31% and 27% in Cantillana and Enviciados, respectively, can also be considered acceptable (Jamieson et al., 1991).Figs. 2 A and 2B show the NDVI prediction map for the two sites.Darker green colours correspond to higher NDVI values and their distribution, lighter green colours correspond to lower NDVI values.Figs.2C and 2D show yield estimated by ordinary kriging at the two locations.A very similar trend can be observed to that of NDVI in Fig. 2, with higher yield values for those areas where NDVI also showed tistical procedure proposed by Loague & Green (1991), consisting of the best fit of the predictions.This procedure is based on the relative root mean square error (RRMSE), calculated from the following equation: where n is the number of observations, Pi is the predicted value, Oi is the measured value, and Õ is the mean of the measured values.Validation is considered to be excellent when the RRMSE is <10%, good if it is between 10% and 20%, acceptable if the RRMSE is between 20% and 30%, and poor if it is >30% (Jamieson et al., 1991).

Results
Table 3 shows an exploratory analysis of the data distribution, described using classical descriptive statistics.Coefficient of variation (CV) values are similar for all parameters.Mean and median values were very similar, which could be initially indicative of data from a normal distribution.This was ratified by the fact that low skewness values were obtained.The skewness value is based on the size of the tails of a distribution and provides a measure of how likely the distribution will produce outliers.Thus, in this work, outliers should be scarce, and in fact there were not any, which is important to obtain accurate estimates.And, moreover, most of the coefficients of kurtosis were close to 3 (kurtosis of a normal distribution).Although normality is not a prerequisite for kriging, kriging will only generate the best absolute estimate if the random function fits a normal distribution (e.g., Goovaerts, 1997).A correlation matrix between NDVI and yield in the study areas was also developed (Table 2).The coefficient of correlation between NDVI and yield was correlated in the point samples taken in the two locations, with values of 0.71 and 0.67 in Cantillana and Enviciados, respectively.Spherical mathematics models were used to develop the variograms.In this work, the    NDVI and guided sampling to develop yield prediction maps sented on Fig. 3. Lower nugget effects were obtained for NDVI in both plots (Figs.3A and B), consequence of the sampling density, the structural analysis was more accurate that for yield (Fig. 3C and D), also their sills were different because their variances were different as well.The ranges for both NDVI were similar, and lower than yield.Table 5 shows that better results were obtained with the maps developed with regression kriging, with RRMSE values of 14% and 17% in Cantillana and Enviciados, respectively, compared to 27% and 31 % when using ordinary kriging.

Discussion
The exploratory analysis of the data distribution in this work could indicate a normal distribution of the data, as it is indicated in the results chapter, corroborating the choice of kriging for this work.Kriging has been recommended as the best method to interpolate point data since it minimizes error variance using a weighted linear combination of the data (Panagopoulos et al. 2006).Given the large number of sampling points taken, and the spatial distribution of them in our work, a method as kriging was needed to minimize higher values (blue colours), though a larger transition area (green and yellow colours) can also be seen as a result of the fewer samples used in the geostatistical analysis.With this in mind, a third map was generated (Figs.2E and 2F) in which yield was estimated for the two locations using regression kriging.With the higher yield values shown in blue and the lower values in red and orange, it can be seen that the transition surfaces are practically negligible as occurred in the NDVI prediction maps, because the smaller number of samples of the primary variable (yield) was associated with a broadly sampled secondary variable (NDVI).This was confirmed by the RRMSE geostatistical analysis comparing the regression and ordinary kriging prediction maps.Theoretical spherical models are repre-   (Isaaks & Srivastava, 1989).For this reason, spherical mathematical models were used to develop the variograms in this work.These variograms showed a considerable nugget effect (Fig. 5), with yield showing the highest nugget-to-sill ratio (~60% in the three locations), suggesting moderate spatial autocorrelation.NDVI had lower nugget-to-sill ratios (~20%), indicating that spatial dependence was generally strong.According to Cambardella et al. (1994), the nugget-to-sill ratio can be used to denote the spatial dependence of attributes (ratio < 25% indicates strong spatial dependence; between 25 and 75% moderate spatial dependence and greater than 75% weak spatial dependence).The NDVI index is an indicator that can be related with different physiological processes essential to achieve yield (Gianquinto et al., 2011).Many parameters are crucial for the functionality of the basic plant physiological processes and affect crop yield, parameters as chlorophyll concentration has been studied for many authors using the vegetation index based on plant greenness with good results (Vouillot et al., 1998;Koller & Upadhyaya, 2005;Gianquinto et al., 2011).In our study case, this parameter was not studied, but it is a parameter related with the health status of the plant, which is necessary to reach the suitable level of yield.Vegetation indices based on plant greenness also have been used to predict yield in differents crops (Ma et al., 2001).In the specific case of processing tomato, Gianquinto et al. (2011) confirmed correlations between many of the calculated reflectance indices (including NDVI), with yield in processing tomato crop.We obtained similar results in our work for NDVI in processing tomato too, where areas with high NDVI values were also the most productive.
There are no studies showing the benefits of geostatistical analysis techniques to develop NDVI-based yield prediction maps for the processing tomato, though numerous studies have shown the benefits of geostatistical analysis techniques for agricultural management (Heisel et al., 1996;Stewart et al., 2002;Yamagishi et al., 2003).This study shows that prediction maps based on the normalized difference vegetation index (NDVI) were suitable for describing crop yield.The NDVI data,

Table 2 .
Correlation matrix between yield and NDVI for the two locations.

Table 3 .
Descriptive statistics of the sample data in the study areas n = number of samples, SD = standard deviation, CV = coefficient of variation.

Table 4 .
Theoretical spherical variograms fitted to experimental variograms for the residual data.

Table 5 .
Relative root mean square error (RRMSE, in %) determined for the different locations with the two applied geostatistical estimation methods.