Statistical models to describe the fruit growth pattern in sweet orange ‘ Valencia late ’

The objective of the present work was to find the statistical model that best describes the pattern of fruit growth of sweet orange ‘Valencia late’ in the departments of Concepción (orchard 1) and General Paz (orchard 2), province of Corrientes, Argentina. In order to fit the growth curves, models of sigmoid type: Logistic, Gompertz, Weibull, Morgan Mercer Flodin, Richards, and reparameterizations of the Logistic and Gompertz models, were evaluated and compared. As criteria for selecting the model, measures of nonlinearity and estimates of residual variance for the different models and reparameterizations were compared. The model found to be most suitable to describe ‘Valencia late’ orange fruit growth was the fifth parameterization of Logistic model: . In this model, β and γ had similar values on all fruit sizes in orchard 1, but different values for fruit sizes in orchard 2; α values varied for both orchards between fruit sizes. For this reason, a family of curves will be necessary for different situations. Additional key words: citrus fruits, logistic equation, measures of nonlinearity.


Introduction
To predict citric production volume, several fruits characteristics must be known. One of them, with great importance in the precision of predicted volumes, is the size that fruits will have at the end of the fructification period (Alvarez et al., 2001). The knowledge of fruit growth curves, allows a description of development in the season and zone under study an estimation of weight at moment of the harvest. The curves that represent the growth vary with the species. In the case of citric fruits, the form will depend on their origin (parthenocarpy or sexual), but in general the response is of a simple sigmoid form (Agustí, 2000).
In the case of sweet orange fruits, the sigmoid curve includes a stage that goes from the anthesis to maturation, characterized by three differentiated periods. The first period, of exponential growth, going from anthesis to the end of physiological fruitset, is characterized by fast growth caused by cellular division. The second period, of linear growth, includes from the end of physiological fruitset to shortly before the change of colour, and its duration is variable according to the variety (2 months in precocious varieties and 5-6 months in late varieties as 'Valencia late'). The third period is characterized by reduced growth rate with changes associated to maturation (Agustí et al., 2003).
Sigmoid curves are frequently used in biology, agriculture and economy to describe growth. Such curves begin at certain point and increase their rate of growth in monotonic form until reaching an inflexion point, after which the growth rate decreases and the function approaches an asymptotic value (Ratkowsky, 1983). The mathematical functions proposed to model these curves are: Logistic, Gompertz, Richards (1959), Morgan-Mercer-Flodin (MMF) (1975) and models derived from the Weibull distribution, with different parameterizations that confer particular characteristics, constituting families of curves (Ratkowsky, 1983).
Among the authors who have studied the growth of fruits we can mention Bramardi et al. (1997), who use the measures of nonlinearity developed by Bates and Watts (1980), and select a Logistic model under its third reparameterization according to Ratkowsky (1983), to describe pear growth for cvs. William´s and Packham´s Triumph in the Alto Valle de Río Negro and Neuquén, Argentina. Alvarez and Boche (1999) obtained a generalization of simple sigmoid models to fit the diameter growth of a Nectarine late variety in Neuquén, Argentina and managed to fit a generalized logistic function with five parameters. Casierra-Posada et al. (2004) fitted polynomial models of third degree to variable fresh and dry weight, polar diameter / suture diameter ratio, and branches growth, based on the days after full flowering for peach cv. Conservero in Paipa, Colombia. The coefficient of determination (R 2 ) was used as the fitting criterion, giving values greater than 0.97 in every case. García Petillo and Castel (2004) used the fruits size (volume in cm 3 ) as a variable to compare different levels of irrigation in plots of 'Valencia' orange in San José, Uruguay, during five seasons. To generalize the effect of treatments, fitted a logistic model to the growth curves and evaluated the significance of the three parameters in different treatments and years. All parameters of the Logistic model result significant and significantly different for years and treatments.
Regression analysis is a statistical method that finds a mathematical expression which relates two or more variables and explains one of them through the rest (Draper and Smith, 1981). The method of least squares is most often used for linear model fitting, but when nonlinear models are involved, the equations are nonlinear and, in general, difficult to solve. This is why the sum of squared residuals is minimized using iterative procedures. A method frequently used in nonlinear regression algorithms is the linearization of the nonlinear function, followed by the iterative method of Gauss-Newton to estimate the parameters (Montgomery et al., 2004).
In order to evaluate the fit of a nonlinear model, the classic criteria used in linear regression (R 2 ), signification of parameters, residual distribution and magnitude of the residual variance estimation (σ 2 ) must be considered, and in addition nonlinearity measures like the presented by Bates and Watts in 1980, which allow quantify the departure from linearity of a set of datamodel-parameterizations and the estimation of the Box bias (Ratkowsky, 1983;Montgomery et al., 2004). Assuming a nonlinear model given by: Y = ƒ(X, θ) + ε, intrinsic nonlinearity (IN), measures the curvature of the E(Y) in the sample spaces as θ is changed (expected response surface or locus solution). In the linear model the solution locus is linear. IN is defined for a specific data set and model (Ratkowsky, 1989). Another measure is the parameter-effect nonlinearity (PE), indicates the regularity spacing from a point to the solution curve when θ constantly increments. For a linear model to constant increment of parameter vector, the points ƒ(X, θ) are equally spaced. This value is determined by the form in which the parameters appear in the model, that is to say, it depends on the selected parameterization (Ratkowsky, 1989).
A criterion to decide when a value of IN or PE is small is to use the relation 1/ F, knowing that both approach a Snedecor F distribution with numerator degrees of freedom equal to the number of parameters, and denominator degrees of freedom equal to the number of the datum points less the number of parameters (Ratkowsky, 1989).
The objective of the present work was to find the statistical model that best describes the pattern of orange fruits 'Valencia late' growth in the departments of Concepción and General Paz, province of Corrientes, Argentina.
In both locations plots with good sanitary and productive conditions were selected, and handled with common regional cultural practices. These plots had 11 yr old trees at a 357 tree ha -1 density, planted at 4x7 m spacing (orchard 1); and 26 yr old at a 312 tree ha -1 density, planted at 4x8 m spacing (orchard 2).
In both orchards, 30 trees were randomly selected and 745 fruits distributed at the top were identified to represent all the size ranges, and measured weekly until harvest for equatorial diameter (mm) with digital calipers from 60 days after full flowering (DAFF) (80% of open flowers) in orchard 1 and from 50 DAFF in orchard 2. The data are based on DAFF to make comparisons on the basis of phenology states and not on chronological dates.
Because of the complexity of nonlinear regression techniques, which make impossible or difficult the analysis on the collected data samples, five fruits by three size ranges (defined in function of the distribution of sizes to harvest one of the orchards) were randomly selected for each orchard: small fruit (1-5), medium (6-10), and large ones (11-15). Following the methodology used by Bramardi et al. (1998), Barrozo et al. (2000) and Yeragani et al. (2003), working with small data sets for the fit of nonlinear equations.

Analyzed models
In Tables 1 and 2, statistical models analyzed correspond to the first parameterization and some of the reparameterizations of the initial forms more frequently mentioned (Draper and Smith, 1981;Ratkowsky, 1983Ratkowsky, , 1989Montgomery et al., 2004).
In the different models and reparameterizations (Tables 1 and 2), parameters (α, β, γ, δ) have interpretations in aspects of the growth curves. Alpha (α), is the parameter related to the superior asymptote (SA). Beta (β) is related to the intercept on the Y-axis (INT). Gamma (γ), is related to the speed of growth of the curve from an initial value (β magnitude) to a final value (α magnitude), indicates the growth rate (GR). Delta (δ), only present in the models of four parameters, provides flexibility for the model to fit data (Ratkowsky, 1983).

Statistical analysis
The fitting of different models was made by means of procedure PROC NLIN in SAS statistical package (2000), with the computational method of Gauss-Newton. The initial values for the estimators were calculated according to the methodology by Ratkowsky (1983). Table 1. Models analyzed for curves of sigmoid growth type 1 y= equatorial diameter of fruit; x= DAFF; α, β, γ, δ = parameters.

Models
Equation 1 Gompertz (G1) Logistic ( In order to verify the goodness of fit for different models, classic criteria like R 2 , signification of parameters (t test), residual distribution and magnitude of the σ 2 , were analyzed. Magnitudes of IN and PE for each model were compared and significances were evaluated with the critical value 1/ F, where F=F(p, v, α) with v=n-p, n=number of observations in time and p=number of parameters (Ratkowsky, 1989). These values measure the degree of linearity in a model-parameterization, and if non-significant, the response could be considered to behave like a linear model, with all the properties that it implies. The values of bias estimators for the different reparameterizations from the models selected were compared. For calculation of the nonlinearity measures and the bias estimators, a program developed by Bramardi et al. (1997), which uses procedure IML of the SAS (SAS Institute, 2000), was applied.
To examine the differences between parameters, obtained with the selected parameterization of Logistic model (L5), the likelihood ratio test proposed by Regazzi and Silva (2004) was used, employing procedure NLIN-SAS (SAS Institute, 2000). The test consists in estimating residual sums of square for complete (SQRcom) and restrict (SQRrest) models, followed by a chi-square distribution which can be estimated in: (SQRrest) and p rest number of parameters estimated in complete and restrict models respectively (Regazzi and Silva, 2004).

Results
In Figs. 1a and 1b, equatorial diameters (mm) as a function of days after full flowering (DAFF) in orchards 1 and 2 are presented. In both orchards the curves are similar, with superior asymptote depending on the final fruit size. However, the initial values in orchard 1 (60 DAFF) have similar values for the different sizes of fruit, in orchard 2 (50 DAFF) these values are different for each fruit size.
In the fit of the models Logistic, Gompertz and their respective reparameterizations, there were no problems of convergence with the method of Gauss-Newton. Models MMF and Richards did not converge for any fruit. Weibull model converged only for one fruit in orchard 1 and for 11 fruits of orchard 2, but the estimated parameters were not significantly different from zero. Therefore, models with four parameters (MMF, Richards and Weibull) were not considered for further analysis.
With respect to classical methods of selecting models, R 2 varied from 0.94 to 0.96 in Logistic models and from 0.95 to 0.97 for Gompertz models. In the last one, values were higher, however differences were not significant. For both models all parameters were highly significant, according with the t test (p<0.01). Distributions of Gompertz and Logistic residuals show homocedasticity and normality, with cyclic tendency. Gompertz and Logistic models have low values of σ 2 in both orchards and all fruits, being in Gompertz model slightly inferior (Table 3). Table 3 also shows the measures IN and PE corresponding to the Gompertz and Logistic models in its initial parameterizations. In all models evaluated in their initial parameterizations (Table 3), the IN is less than the critical value 1/ F for an error type I (α =0.05). The measurement IN was in all cases surpassed by PE (Table  3), therefore the other proposed reparameterizations for each model were analyzed. Table 4 shows the PE for different reparameterizations from Logistic models (L2, L3, L4, L5 and L6) and Gompertz (G2). A reduction in the PE is observed with L5 and G2 parameterization. In orchard 1, PE values of the L5 parameterization are inferior to the critical value (1/ F = 0.5906) in all fruits, and for orchard 2 PE values are inferior to the critical value (1/ F= 0.5838) in thirteen of fifteen fruit. For G2 parameterization, the values of PE are inferior to the critical value (1/ F= 0.5906) in twelve of fifteen fruit in orchard 1, and for orchard 2 PE values are inferior to the critical value (1/ F= 0.5838) in seven of fifteen fruit (Table 4).
The percentage of Box bias for the estimators of parameters of G2 and L5 models was insignificant in all cases (it indicates the estimators are unbiased), and this did not prove to be a useful tool in choosing between models.
According to criteria used to evaluate the fit of models to describe the pattern of 'Valencia late' orange fruit growth, the most adequate turns out to be the L5 model. Therefore, L5 parameterization was only considered in the likelihood ratio test for the parameters of fitted models for different orchards and fruit sizes. In both orchards, estimation of α for different fruit sizes were significantly different (p<0.0001). Estimations of β and γ for different fruit sizes were not significantly different for orchard 1 (p=0.2920, p=0.2986), and significantly different in orchard 2 (p<0.0001, p= 0.0300). Table 5 shows the estimation of parameters α, β and γ and aspects of the growth curves related, for L5 model average growth curves by orchard and fruit size. SA increases with fruit sizes and has similar values for both orchards. INT and GR have similar values on different fruit sizes in orchard 1, but in orchard 2 INT increases and GR decreases with fruit size.

Discussion
The constructed growth curves with sampled fruits are similar for both orchards, the shape is not a typical  sigmoid as described by Agustí (2000), because they lack a clear definition of inferior asymptote. This can be attributed to the fact that the measurements began after phase I or period of exponential growth (60 DAFF in orchard 1 and 50 DAFF in orchard 2), the moment at which just the existence of a fruit can be defined convincingly and reaches a size such that allows measurements of equatorial diameter (Agustí et al., 2003). The lack of convergence of the method of Gauss-Newton in models with four parameters can be due to the fruit growth for the range observed of measurement not following a strongly sigmoid behaviour; which would imply overparameterized models to describe the pattern of orange fruits growth. Similar results were found by Bramardi et al. (1997) for pear fruits cvs. William´s and Packham´s Triumph.
Classical methods of selecting models were not useful to evaluate the goodness of fit. R 2 values were similar for Gompertz and Logistic models, and analogous to values obtained by Casierra-Posada et al. (2004), in the fitted of polynomial models for growth curves for peach cv. Conservero. However, authors as Draper ), Healy (1984), and Helland (1987, consider inadequate the use of R 2 criterion to evaluate the fit of nonlinear models. Results obtained with the t test for all parameters in both models, were similar to results obtained by García Petillo and Castel (2004), which fitted logistic models to growth curves in Valencia orange. Distributions of Gom-Critical value for α=0.05, 1/ F =0.5906 (Orchard 1) and 0.5838 (Orchard 2), according to degrees of freedom. * Significant values (α=0.05).  Table 5. Estimation of parameters α, β and γ and their aspects of the growth curves, for L5 model average growth curves by orchard and fruit size pertz and Logistic residuals show adequate homoscedasticity and normality. The cyclic tendency found can be associated with repeated measures on the same fruits in time, aspect that must be consider when using selected model for inferences (Montgomery et al., 2004). In both orchards, estimations of σ 2 were higher in Logistic than Gompertz model, therefore it is considered that this criterion has less importance than the measures of nonlinearity (IN and PE) when selecting a model. Moreover in both models the values of dispersion are very low (higher value of σ 2 was 6.7813). In addition, estimation errors will not be relevant in the estimation of sizes at harvest. The values of σ are generated by the distances between the observed points and the fitted curves, are related to the sampled fruits and vary from one sample to another (Montgomery et al., 2004).

PE
The non significant IN for initial parameterization of Gompertz and Logistic models, indicates that the degree of curvature of the solution curve is despicable and allows considering that is not far off linear model form. The presence of a nonlinear PE, greater than IN, suggests the PE dominates the nonlinearity of the models, which represents an advantage because PE can be attenuated by an appropriate reparameterization (Ratkowsky, 1989).
According to PE values (Table 4), the best reparameterization for Logistic models was L5 and for Gompertz models was G2. In both orchards PE of the L5 parameterization, were significantly inferior to the critical value of Ratkowsky (1989) in 93.3% of the fruits and G2 parameterization in 63.3% of the fruits. Considering from the point of view of its nonlinear behaviour, which must be prioritize when the final objective is the inference, L5 is the most satisfactory parameterization, and should have better inferential properties.
The selection of logistic model in its fifth parameterization (L5), as the most adequate to describe the pattern of 'Valencia late' orange fruit growth, agrees with the results obtained by Bramardi et al. (1997), who fit a logistic model in third parameterization (L3) according to Ratkowsky (1983), to describe the growth of pear fruits cvs. William´s and Packham´s Triumph in the Alto Valle de Río Negro and Neuquén, Argentina; and Alvarez and Boche (1999) found that a logistic function generalized with five parameters turns out to be the model best adapted to describe growth of Nectarin (cv. Sun Grand) fruits in Neuquén, Argentina. García Petillo et al. (2004), fit a logistic model in their original parameterization to describe the 'Valencia' orange fruit growth in San José, Uruguay.
The interpretation of the L5 parameters according to Ratkowsky (1983) is as follows: α: parameter related to the superior asymptote (SA), in this parameterization is the inverse one (1/α). β: parameter that relates the superior and inferior asymptotes, describes the Y-value corresponding to X = 0 (INT). In this parameterization responds to the function: [1/(α + e β )]. γ: parameter related to the rate of growth (GR) from the initial values (β magnitude) until the final values (α magnitude), in this parameterization it is given by the function -ln γ.
Non significant differences of the parameters β and γ for orchard 1 imply homogeneous the position of intercept and growth rates in curves for different fruit sizes. Significant differences regarding parameters β and γ for fruit sizes in orchard 2, indicates that the position of intercept and growth rates depends on fruit sizes. The parameter α showed significant differences in both orchards, showing the expected variation in the superior asymptote for different curves by fruit sizes, these properties can be clearly visualized in the curves of Fig. 1a. For this reason, a family of curves will be necessary for different orchards and fruit sizes.
An extension of the present work could be to fit the selected model for fruits classified according to their harvest commercial size and to construct growth tables. The behaviour of the estimated parameters of the selected model for each fruit, considering factors such as age and season might be analyzed.