Uncertainty analysis of the HORTSYST model applied to fertigated tomatoes cultivated in a hydroponic greenhouse system

Aim of study: The objective was to perform an uncertainty analysis (UA) of the dynamic HORTSYST model applied to greenhouse grown hydroponic tomato crop. A frequentist method based on Monte Carlo simulation and the Generalized Likelihood Uncertainty Estimation (GLUE) procedure were used. Area of study: Two tomato cultivation experiments were carried out, during autumn-winter and spring-summer crop seasons, in a research greenhouse located at University of Chapingo, Chapingo, Mexico. Material and methods: The uncertainties of the HORTSYST model predictions PTI, LAI, DMP, ETc, N up , P up , K up , Ca up , and Mg up uptake, were calculated, by specifying the uncertainty of model parameters 10% and 20% around their nominal values. Uniform PDFs were speci-fied for all model parameters and LHS sampling was applied. The Monte Carlo and the GLUE methods used 10,000 and 2,000 simulations, respectively. The frequentist method included the statistical measures: minimum, maximum, average values, CV, skewness, and kurtosis whilst GLUE used CI, RMSE, and scatter plots. Main results: As parameters were changed 10%, the CV, for all outputs, were lower than 15%. The smallest values were for LAI (10.75%) and DMP (11.14%) and the largest was for ETc (14.47%). For Ca up (12.15%) and P up (12.27%), the CV was lower than the one for N up and K up . Kurtosis and skewness values were close as expected for a normal distribution. According to GLUE, crop density was found to be the most relevant parameter given that it yielded the lowest RMSE value between the simulated and measured values. Research highlights: Acceptable fitting of HORTSYST was achieved since its predictions were inside 95% CI with the GLUE procedure.


Introduction
An important issue in greenhouse horticulture is optimization of water and nutrients, which can be tackled by using decision-support-systems (DSS) based on dynamic mathematical models such as VEGSYST (Gallardo et al., 2011). However, the development of a mathematical model of a system implies not only the derivation of model structure and the quantification of relationships between the components of a system but also sensitivity and uncertainty analyses, parameter estimation and model evaluation. Uncertainty assessment (Walker et al., 2003) is a crucial stage in model development focused on quantifying the reliability of model predictions. Refsgaard et al. (2005) emphasized that performing an uncertainty analysis (UA) on a model start with problem definition and identification of modeling objectives. In order to be most useful, the decision support model should also include information about the uncertainties related to each decision option as uncertainty of the desired outcomes may be the central criterion for the selection of the management policy (Wallach et al., 2014;Uusitalo et al., 2015;Liang et al., 2017).
Several methodologies and suitable tools for supporting uncertainty assessment have been developed and reported by Cooman & Schrevens (2006). There are few studies reporting the frequentist uncertainty analysis (Monte Carlo method) applied to greenhouse crop models, some of these are TOMGRO model applied to tomato (Cooman & Schrevens, 2006) and NICOLET model to lettuce . Most of the research has been focused on open field crops using, for example, the CERES-maize model (Bert et al., 2007;Li et al., 2012), the SALUS model for maize, peanut and cotton (Dzotsi et al., 2013), and the WARM rice model (Confaloneri et al., 2016). However, only few studies, such as one involving the SIMRIW model for paddy rice and another using the CSM-CROPGRO-cotton model for open field crops (Iizumi et al., 2009;Pathak et al., 2012) have studied the generalized likelihood uncertainty estimation (GLUE) method. To the best of our knowledge GLUE is the most reliable uncertainty analysis procedure developed until now.
HORTSYST is a new nonlinear dynamic growth model for tomato (Solanum lycopersicum L.) grown in hydroponic greenhouse systems. The HORTSYST crop model predicts photo-thermal time (PTI), dry matter production (DMP), and nitrogen (N up ), phosphorus (P up ), potassium (K up ), calcium (Ca up ) and magnesium (Mg up ) uptake, as state variables, because they are represented as daily variation rates, and it predicts crop transpiration (ETc) and leaf area index (LAI) as output variables. This model was developed by Martinez-Ruiz et al. (2019;2020) to be used as a tool for decision-support systems. Although it does not currently consider any kind of stress, future work should consider water and nutrients limitations. HORTSYST was developed based on the VegSyst model (Gallardo et al., 2011;Giménez et al., 2013;Granados et al., 2013), but HORTSYST has the following differences: 1) the HORTSYST dynamic model is written explicitly in discrete time, 2) the LAI variable was added, and it is simulated from a concept called photothermal time (which couples the effect of air temperature and global solar radiation measured in the greenhouse), 3) it considers crop density as one of the most important parameters, 4) crop water consumption is calculated with a transpiration model based on mass and energy balances (Martinez-Ruiz et al., 2012;2020), and 5) N up , P up , K up , Ca up and Mg up uptake are included in the model structure.
As in the case of the VegSyst model, the HORTSYST model is intended to support the management of water and nutrient supply for greenhouse crops; however, the model's ability to provide reliable forecasting under several input variables scenarios is currently unknown. Unfortunately, it is not possible to carry out many experiments to determine the robustness of model predictions. Therefore, an uncertainty analysis based on Monte Carlo simulations is chosen to implement multiple simulations based on model parameters values sampled from probability density functions (PDF), with a given amount of uncertainty, assigned to each model parameter. From these scenarios the predicted variables are calculated, and their uncertainty quantified and analyzed using descriptive statistics.
There is little current research dealing with the application of uncertainty analysis applied to greenhouse crop models and sometimes these methodologies are confused with sensitivity analysis procedures. Therefore, the objectives of this research were: 1) to quantify the uncertainty present in the variables PTI, LAI, DMP, ETc, nitrogen, phosphorus, potassium, calcium, and magnesium uptake, predicted by the HORTSYST model by varying nominal parameter values by 10% and 20%; and 2) to calculate model uncertainties with the frequentist uncertainty (Monte Carlo) method and the GLUE method using data from two experiments carried out in the autumn-winter and spring summer tomato crop growing seasons.

Greenhouse condition and data acquisition
Two experiments were carried out in greenhouses located at the University of Chapingo, Mexico (19°29' N, 98°53' W; 2240 m a.s.l.). During the autumn-winter and spring-summer seasons, tomato (Solanum lycopersicum L) cultivar "CID F1" was grown in a hydroponic system. Plastic bags of 10 L capacity were used, which were filled with volcanic sand (Tuff) as substrate. Plants were distributed with a density of 3.5 plants m -2 for both crop 3 HORTSYT model applied to fertigated tomatoes in a hydroponic greenhouse system seasons. In the first experiment, tomato seeds were sown on July 18, 2015, and the seedlings were transplanted on August 21, 2015, in an 8 × 8 m chapel type glasshouse. For the second experiment, the seeds were sown on March 24, 2016, and the seedlings were transplanted on April 24, 2016, in 35 × 35 cm 12-L polyethylene bag pots in an 8 × 15 m plastic greenhouse with natural ventilation. Both experiments were fertilized with Steiner nutrient solution (Martínez-Ruiz et al., 2019;2020). A weather station (HOBO, Onset Computer Corporation, Bourne, MA, USA) was installed inside the greenhouse. Temperature and relative humidity were measured with an S-THB-M008 model sensor (Onset) placed at a height of 1.5 m. Global Solar Radiation was measured with an S-LIB-M003 sensor (HOBO, Onset) located at 3.5 m above the ground. Both sensors were connected to a U-30-NRC model data logger (HOBO), and the data were recorded every minute, and subsequently the data were processed to obtain average data at hourly intervals.
In each experiment, three plants were randomly chosen and harvested every ten days to measure the DMP, LAI, N up , P up , K up , Ca up , and Mg up . The plants were dried for 72 h at 70 °C in a convection drying oven (Binder, ED-400 model). Nutrients in the stems, leaves and fruits were determined. Then 0.5 g dry matter was subjected to wet digestion with mixture of 5 mL of sulfuric acid and perchloric acid at a 4:1 ratio, both acids with 99% purity. Samples were digested until fully mineralized at a temperature of 250 ºC for approximately six hours; subsequently, 2 mL of hydrogen peroxide at 30% were added to the mineralized samples, and they were adjusted to 50 mL with deionized water. Nitrogen was determined by the Kjeldahl method (Sáez-Plaza et al., 2013). Phosphorus concentration was determined by the yellow molybdovanadate method, and K, Ca and Mg concentrations were measured by atomic absorption spectrophotometry (Oliveira et al., 2010). LAI was estimated by a nondestructive method which consisted of randomly taking four plants to measure the width and length of their leaves, and total leaf area was measured using a plant canopy analyzer (LAI-3100, LI-COR, Lincoln, NE, USA). From those measurements, nonlinear regression models were fitted to calculate the crop LAI. ETc was measured every minute by a weighing lysimeter located in the central row of each greenhouse. The device included an electronic balance Sartorius QA model (scale capacity=120 kg, resolution ±0.5 g equipped with a tray holding four plants. The weight loss measured was assumed to be equal to the ETc (Martinez-Ruiz et al., 2012).

Model description
The dynamic HORTSYST model assumes no water and nutrient limitations (Martínez-Ruiz et al., 2019;2020), and it simulates PTI (MJ m -2 d -1 ), DMP (g m -2 ), N up (g m -2 ), P up (g m -2 ), K up ,(g m -2 ), Ca up (g m -2 ), and Mg up (g m -2 ) as the state variables, while ETc (kg m -2 ) and LAI (m 2 m -2 ) were considered as output variables. Table  S1 [suppl.] lists the mathematical equations of the seven state and two output variables. Fig. 1 shows the general structure of the model using a Forrester diagram, where the inputs, parameters, state variables and outputs of the model are drawn.
The fraction of light intercepted (f i-PAR ) is the fraction of global solar radiation used for the photosynthesis process that enters through the canopy of a crop and is characterized by the LAI. The extinction coefficient (dimensionless k parameter) is related to leaf size and leaf orientation. Leaf area (LA) was modeled as a function of PTI using the Michaelis-Menten equation and it was multiplied by the planting density d in order to calculate the LAI. For this purpose, the normalized thermal time (TT, ºC), defined as the ratio of the growth rate and the conditions of actual and optimum temperature ( Dai et al., 2006), was calculated with Eq. (4). Then, the daily ΔPTI (Eq. 7) was calculated as the product of normalized thermal time with the fraction of light intercepted (f i-PAR ) and PAR radiation, and the accumulation of PTI was computed by Eq. (1) (Xu et al., 2010).
Daily ΔN up , ΔP up , ΔK up , ΔCa up and ΔMg up were calculated as the product of simulated DMP and nutrient concentration (Eq. 11). Previously the %N, %P, %K, %Ca and %Mg concentrations were determined by a power equation (Tei et al., 2002) (Eq. 10). Then their accumulated values were computed by Eqs. (12-16). Finally, the ETc was simulated hourly by Eq. (18), using the data recorded for global solar radiation, vapor pressure deficit, the fraction of light intercepted, and LAI as shown in Eq. (17). The daily transpiration was accumulated for the 24-h period using Eq. (3).

Monte Carlo uncertainty method
Monte Carlo simulation is a statistical technique for stochastic modeling and analysis of error propagation in calculations. Its aim is to trace out the structure of the probability distributions of the model output variables. These distributions are mapped by quantifying the deterministic results (realizations) for a large number of unbiased random draws (Matott et al., 2009) from the individual distribution function of input data and model parameters (Monod et al., 2006).
The uncertainty analysis consisted of the following four steps (Monod et al., 2006): a) for the spring-summer season, uniform PDFs were selected for HORTSYST model parameters (Table 1). Other PDFs are possible but the only information available for model parameters were their nominal values. Nominal values for all parameters were taken from literature. The lower and upper limits of the uncertainty intervals were defined with 10% and 20% of parameters variation around their nominal values as listed in Table 1; b) Latin hypercube sampling (LHS) was applied to choose model parameters values for the generation of N=10,000 scenarios; c) the output variables predicted by the model were calculated for all N scenarios, running N model simulations, using the climatic data measured in the greenhouse (Figs. 2a,2b,2c); and d) for the predicted variables (PTI, LAI, DMP, N up , P up , K up , Ca up , Mg up and ETc), the following statistical indicators were calculated: minimum, maximum, mean, coefficient of variation (CV), skewness, and kurtosis, as well as the histograms.

The generalized likelihood uncertainty estimation (GLUE) uncertainty method
This method is based on the Monte Carlo simulation, in which parameter sets may be sampled from some probability distribution function (PDF). The most commonly used PDF is a uniform distribution. Parameters values are also sampled from those PDF. Each parameter set is used to produce a model output; the acceptability of each model run is then assessed using a goodness-of-fit criterion which compares the predicted and observed values over some calibration period. As part of the GLUE procedure (Beven & Freer, 2001;Makowski et al., 2002;Stedinger et al., 2008;Beven & Binley, 2014), several likelihood functions can be used such as RMSE (root mean square error), inverse error variance, efficiency index, among others.
The GLUE procedure was applied to the HORTSYST model using LHS with 2,000 samples. Thus, 2,000 simulations were run using 10% and 20% of parameters variations around the nominal values of the 24 parameters (listed in Table 1), for the autumn-winter crop cycle. In order to confirm and compare the results that will be obtained with Frequentist Method what is considered the standard method for running a uncertainty analysis. The Matlab toolbox sensitivity analysis for everybody (SAFE) was used to implement the GLUE uncertainty analysis method HORTSYT model applied to fertigated tomatoes in a hydroponic greenhouse system since SAFE contains a module that allows not only performing uncertainty analyses but also it includes visualization tools such as scatter plots. This toolbox is freely available from the authors for non-commercial research and educational uses (Pianosi et al., 2015).

Results
Figs. 2a, 2b and 2c show the daily averaged measurements of the integration of global solar radiation, air temperature, and relative humidity inside the greenhouses for the autumn-winter and spring-summer seasons. These data were fed as input variables into the HORTSYST model. Input variable values represent two different weather conditions since averaged values of global radiation were 3.99 and 10.59 MJ m -2 d -1 , respectively, although averaged values of air temperature were 18.3°C and 17.8°C for both seasons. However, relative humidity was also somewhat contrasting not because of its averaged values of 78.6% and 76.8%, but rather for its minimum values of 62.5% and 29.5% compared to its maximum values of 93.4% and 93.2%, respectively. Thus, the variation in relative humidity during spring-summer (63%) was roughly twice that for autumn-winter (31%).

Model output uncertainty with Monte Carlo method
HORTSYST model behavior regarding PTI, LAI and DMP is shown in Figs. 3a, 3c, 3e, respectively. Measured values are only reported for LAI and DMP, because PTI is computed during the simulations and has not been measured. The corresponding histograms are also reported (Figs. 3b, 3d, 3f). These results were obtained using the following input variables: global solar radiation, temperature, and relative humidity recorded over the  Table 2). It is worthwhile noting that those quantities were calculated at the end of the cultivation period. In general, the uncertainty of the model predictions was increased with larger uncertainty intervals, as expected. The CV values of all variables were larger for 20% than 10% of the uncertainty variation in the model parameters. Furthermore, all predicted variables have CV values lower than 15% in the case of a 10% variation in the parameter values, which means the model is highly reliable and robust. Low CV values mean a reduction of average values for all predicted variables. Measured values at the end of the cultivation cycle were 6.86 m 2 m -2 for LAI, 1304 g m -2 for DMP, and 291.69 kg m -2 for ETc. When these values were compared with HORTSYST model predictions, namely averaged values of predicted variables in Table 2 with 10% parameter variations, the deviations (difference between observed and estimated values) were: -1.7% for LAI, -1.7% for DMP and -0.3% for ETc. In the case of macronutrients, measured values and their corresponding deviations were: 27.4 g m -2 (1.6%) for N up , 8.59 g m -2 (-6.9%) for P up , 68.76 g m -2 (6.1%) for K up , 20.42 g m -2 (2.9%) for Ca up , and 7.63 g m -2 (3.3%) for Mg up . This means the average predicted values of LAI and DMP were slightly over-estimated and the predicted average values of ETc were close to the measured ones, whereas they were under-estimated for N up , K up , Ca up and Mg up , and only P up was over-estimated with respect to the observed data.       HORTSYT model applied to fertigated tomatoes in a hydroponic greenhouse system When the simulation used larger uncertainty intervals for model parameters (parameters varied 20% around their nominal values), the errors calculated from residuals between estimated (averaged values of predicted variables in Table 2 with 20% parameter variations) and measured values were as follows: 0.1% for LAI, 0.7% for DMP, 1.5% for ETc, 3.2% for N up , -5.0% for P up , 8.2% for K up , 5.0% for Ca up , and 3.9% for Mg up . Except for P up , all the average values predicted by the model were under-estimated. Remarkably, the averaged values of LAI, DMP, and P up were closer to the measured data with larger uncertainty. The average error of N up , P up , K up , Ca up and Mg up was higher with a larger uncertainty interval for model parameters. In the case of ETc, the average was more underestimated when the uncertainty included in the parameters was larger.
The differences between maximum and minimum values, for 10% variation of the parameters (Table 2), were 255.48 MJ m -2 d -1 for PTI, 4.74 m 2 m -2 for LAI, 1014.9 g m -2 for DMP, 297.66 kg m -2 for ETc, 22.94 g m -2 for N up , 8.09 g m -2 for P up , 66.84 g m -2 for K up , 18.27 g m -2 for Ca up , and 7.04 g m -2 for Mg up . In the case of 20% variation (Table 2) these values were 255.48 MJ m -2 d -1 for PTI, 9.3 m 2 m -2 for LAI, 1858.7 g m -2 for DMP, 579.43 kg m -2 for ETc, 50.7 g m -2 for N up , 16.03 g m -2 for P up , 121.94 g m -2 for K up , 32.62 g m -2 for Ca up , and 13.96 g m -2 for Mg up . This shows that the intervals of the predicted values increased more than two-fold with a large variation of 20%. Larger differences between maximum and minimum values of all HORTSYST predicted variables were observed with larger uncertainty intervals for model parameters.
The skewness values were positive for all predicted variables, which means that data are more spread out to the right of the distribution, which was observed in the corresponding histograms for 10% parameter variation (Figs. 3b, 3d and 5f; Figs. 4b, 4d and 4f; Figs. 5b, 5d and 5f). All skewness values were remarkably close to zero, which means all predicted variables fit a normal distribution very well, but with greater variation (20% of uncertainty) of the parameters, more asymmetric distributions are expected. In fact, skewness values are all greater for all variables in the case of 20% parameter variations than for 10% ones. Kurtosis values of predicted variables (Table 2) slightly deviate for both uncertainty intervals. This means that for both situations the behavior of predicted variables is remarkably close to a normal distribution (Figs. 3b,3d and 3f;Fig. 4b,4d and 4f;Fig. 5b, 5d and 5f). Fig. 6 shows HORTSYST model predictions with the GLUE procedure with a 95% confidence interval around measured data when 10% model parameter variation was used. Fig. 7 shows the GLUE uncertainty method outcomes in the case of 20 % parameter variation. It is apparent that with a smaller uncertainty interval for model parameters (10% variation around their nominal values), there was lesser uncertainty in all predicted variables (Fig. 6) than when a larger uncertainty interval (20% variation) was used (Fig. 7). Although scatter plots between each HORTSYST parameter and the predicted variables generated by the GLUE uncertainty method were constructed, only those plots where minimum RMSE values can be identified are shown in Figs. S1-S3 [suppl.]. The best RUE parameter value was between 4.0 and 5.5 MJ m -2 d -1 (Fig. S1c [suppl.]), which corresponds to the smallest values of RMSE. The best value for parameter c1 was between 2.5 and 3.3 m 2 (Fig. S2a [suppl.]), and for c 2 it was between 60 and 85 (Fig. S2b [suppl.]). In the case of B d this was between 15 and 35 W m -2 kPa -1 (Fig. S1d [suppl.]). Figs. S1e and S1f [suppl.] show the parameter values of N up ; a (from 6.0 to 7.5 g m -2 ) and b (from -0.2 to -0.15).

Model output uncertainty with GLUE method
According to the RMSE values shown on the scatter plots, the best value for parameter a 2 was between 0.55 and 0.65 g m -2 (Fig. S2a [suppl.]), for b 2 between -0.08 and -0.05 (Fig. S2b [suppl.]), for a 3 between 2.5 and 3.5 g m -2 (Fig. S2c [suppl.]), for b 3 between 0.08 and 0.12 (Fig.  S2d [suppl.]), for a 4 between 2.5 and -3.5 g m -2 (Fig. S2e [suppl.]), for b 4 between -0.13 and -0.06 (Fig. S2f  [suppl.]), for a 5 between 1.7 and 2.3 g m-2 (Fig. S2g [suppl.]), and for b 5 between -0.14 and -0.07 (Fig. S2h [suppl.]). The parameter b (the slope of nutrient concentration curve) for all macronutrient concentrations were negative, except for K up . The scatter plots (Fig. S3 [ suppl.]) between the plant density values (d) and the RMSE of HORTSYST predicted variables show that the best value for this parameter was between 3 and 4 plants m -2 . Therefore, plant density plays an important role in the model's general performance and it had a major effect on LAI and ETc. A density higher than three plants m -2 yielded better fitness values for N up , P up , K up , Ca up , Mg up and DMP (Fig. S3 [suppl.]), whereas the model's performance was inferior with a density lower than three plants m -2 .

Discussion
According to Monte Carlo uncertainty analysis, with small uncertainty intervals for model parameters the uncertainty of HORTSYST model predictions, quantified by the CV, was in decreasing order: ETc > Mg up > K up > N up > P up > PTI > Ca up > DMP > LAI. The CV for all variables ranged from 10% to 14%. Although similar behavior was observed for larger uncertainty intervals, the range of uncertainty variation (CV values ranged from 22% to 30%) was also larger as expected. In both cases, LAI and DMP were predicted more accurately than Mg up , K up , N up , P up , PTI, Ca up , and ETc. Considering the error of the mean value of Monte Carlo simulations against the measured data for output variables, it is apparent that the HORTSYST model had good predictive ability, even though the parameter values included more uncertainty. However, it was observed that with larger uncertainty the accuracy of LAI, DMP and P up estimations was increased. This is because the uncertainty analysis used nominal values of model parameters (values taken from the literature) instead of parameter values estimated by model calibration. Nominal parameter values can be near or far away from the optimal parameter values. Further work is needed to compare model uncertainty quantification before and after parameter estimation by using an optimization procedure.
The GLUE uncertainty analysis procedure confirmed that HORTSYST predictions are reliable, according to the calculated 95% confidence intervals for both model parameter uncertainty intervals (10% and 20% parameter variations around nominal values). Furthermore, this method provides parameter ranges for fitting model predictions to measured data. The intervals for optimal parameter values were in agreement with those obtained by model calibration (Martínez-Ruiz et al., 2019;2020). Also, the value of the parameters RUE, required for DMP, parameters a and b, which are needed for N up , were found in the ranges reported by Gallardo et al. (2014; for the VegSyst model. According to Pathak et al. (2012), estimation of the GLUE and Monte Carlo uncertainty methods is based on the assumption that model parameters are independent. However, it is unlikely that this is the case since there are other sources of uncertainty, such as the input variables of the HORTSYST model (solar radiation, air temperature and humidity), initial conditions of state variables and the equations that are part of the model structure. Therefore, this work could be further extended by including other uncertain factors.
Based on the results of this work, the Monte Carlo and GLUE uncertainty analysis methods are necessary and complementary approaches since the former does not directly use any measured data of predicted variables in contrast to GLUE in which the use of that information is compulsory.
On the other hand, Lopez et al. (2018) found better fit to the measurement of HORTSYST model against VegSyst (Gallardo et al., 2016), the RMSE and the mean absolute error resulted three times lower. The improvement in quality of the prediction of DMP by HORTSYST model can be explained by the good modeling of LAI and the introduction of PTI as state variable. For the quality of the predictions of the HORTYSYS model it is possible to use it in the development of a decision support system (DSS) for irrigation management and nutrition in greenhouses, like those implemented with the VegSyst model developed by Gallardo et al. (2014) for N management and irrigation in greenhouse crops in Mediterranean climates. Elia & Conversa (2015)  The results obtained in this study indicate that uncertainty analysis using both the Monte Carlo and GLUE methods can help in quantifying uncertainties in HORTSYST model predictions. Due to the small uncertainty associated with the model outputs, the model provides acceptable predictions (and it is reliable) when the nominal values of its parameters are varied between 10% and 20% under two different growing conditions (crop season). However, more research work is needed to determine whether HORTSYST can be applied to irrigation and nutrient supply management in greenhouse tomatoes grown under soilless culture. For example, future studies could consider various crop densities, different irrigation and nutrient levels, and temperature-driven stress conditions.