Soybean yield modeling using bootstrap methods for small samples

One of the problems that occur when working with regression models is regarding the sample size; once the statistical methods used in inferential analyzes are asymptotic if the sample is small the analysis may be compromised because the estimates will be biased. An alternative is to use the bootstrap methodology, which in its non-parametric version does not need to guess or know the probability distribution that generated the original sample. In this work we used a set of soybean yield data and physical and chemical soil properties formed with fewer samples to determine a multiple linear regression model. Bootstrap methods were used for variable selection, identification of influential points and for determination of confidence intervals of the model parameters. The results showed that the bootstrap methods enabled us to select the physical and chemical soil properties, which were significant in the construction of the soybean yield regression model, construct the confidence intervals of the parameters and identify the points that had great influence on the estimated parameters. Additional key words: multiple linear regression; model selection; bootstrap global influence diagnosis; bootstrap confidence intervals. Abbreviations used: AIC (Akaike information criterion); BC (bias corrected); Ca (calcium, cmolc/dm); Des (soil density, g/cm3; Des1, from 0 to 0.1 m; Des2, from 0.1 to 0.2 m; Des3, from 0.2 to 0.3 m depths; JaB (jackknife-after-bootstrap); K (potassium, mg/dm3); Mg (magnesium, cmolc/dm); Mn (manganese, mg/dm3); OLS (ordinary least squares); P (phosphorus, mg/dm3); Prod (soybean yield, t/ha); RAdj (adjusted coefficient of determination); RMSE (root mean square error); SRP1 (soil penetration resistance, MPa) from 0 to 0.1 m depth; SRP2 (soil penetration resistance, MPa) from 0.1 to 0.2 m depth; SRP3 (soil penetration resistance, MPa) from 0.2 to 0.3 m depth. Authors’ contributions: Conceptualized the paper, statistical analysis of data, final revision and discussion: GHD, MAUO and JAJ. Reviewing the literature and editing the working versions of the manuscript: GHD. Citation: Dalposso, G. H.; Uribe-Opazo, M. A.; Johann, J. A. (2016). Soybean yield modeling using bootstrap methods for small samples. Spanish Journal of Agricultural Research, Volume 14, Issue 3, e0207. http://dx.doi.org/10.5424/sjar/2016143-8635. Received: 13 Sep 2015. Accepted: 07 Jul 2016 Copyright © 2016 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial (by-nc) Spain 3.0 Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: Araucária Foundation; Council for Scientific and Technological Development (CNPq); Coordination for the Improvement of Higher Education Personnel (CAPES); Post-Graduate Program in Agricultural Engineering (PGEAGRI); Federal Technological University of Paraná (UTFPR). Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Gustavo H. Dalposso: gustavodalposso@utfpr.edu.br


Introduction
Soybean (Glycine max (L.) Merrill) is one of the most economically significant crops worldwide (Kulcheski et al., 2016), and multiple linear regression models are constantly being developed to partially explain yield variations in that crop. When modeling soybean yield, variables concerning agricultural meteorology (Penalba et al., 2007;Tao et al., 2008), agriculture (Zheng et al., 2009), management (Lobell et al.,2 ymptotic method in small samples. Although bootstrap is a well-known technique and is frequently employed in agricultural studies -as seen in works by Sabaghnia et al. (2010), García-Gallego et al. (2015), Losada et al. (2015) and Sutton et al. (2016) -the development of statistical and computing models has led to the study of new techniques based on the bootstrap method.
The objective of this work was to utilize bootstrap methods to select explanatory variables, investigate the existence of influential points through diagnostic analysis, and obtain confidence intervals for the parameters of a multiple linear regression model for soybean yield considering physical and chemical soil properties as explanatory variables.

Study area and data
The data used are from the agricultural year 2013/2014 and from a commercial farming area of 167.35 hectares located in the western region of Paraná, Brazil, near the city of Cascavel, with center coordinates latitude 24°57'18''S and longitude 53°34'29''W and average altitude of 714 m (Fig. 1). Climate in the region is mesothermal and super humid temperate, climate type Cfa (Koeppen) and soil is classified as a dystropheric made, since traditional inference methods are asymptotic and standard errors and confidence intervals may be biased in small samples, as explained by Hao & Naiman (2010). Adopting more parsimonious models and determining influential points, are procedures that can also provide misleading results when working with small sample sets. Kamo et al. (2013) explain that the Akaike information criterion -AIC (Akaike, 1973) used for model selection presents a bias that cannot be ignored, especially with small samples, given that it is derived from asymptotic properties. Regarding the diagnostic measures of overall influences one problem is related to its cutoff points. According to Martin & Roberts (2010) they are based on large sample theory and therefore may not be suitable for small samples.
An alternative to traditional inference methods is the use of the bootstrap, a simulation method developed by Efron (1979) which uses resampling with replacement of the sample data set to perform statistical inferences such as hypothesis testing and determination of confidence intervals (Dubreuil et al., 2014). The bootstrap method has applications in regression analysis (Rahman, 2014), model selection (Al-Marshadi, 2011) and definition of global influence diagnostics (Beyaztas & Alin, 2013).
By comparing the results obtained from bootstrap methods with results of asymptotic methods, Chaves- Neto & Faria (2015) conclude that bootstrap performed well in samples of all sizes and was higher than the as-^^2  Soybean yield modeling using bootstrap methods for small samples centile method (Efron, 1982) and BC (bias corrected) (Efron & Tibshirani, 1986). The Efron's percentile interval with confidence level (1 - α)% was obtained by ordering the bootstrap replicates from the parameters θ i *, i = 1,…,B, and excluding (α/2)% from the replicates situated in its ends. The technique employed to build the BC confidence interval utilizes a value known as constant-bias-correcting to fit the bootstrap distribution of θ ; a roadmap to determining this interval can be found in Shasha & Wilson (2011).

Models selection using bootstrap
For the models selection the bootstrap method was used as proposed by Austin & Tu (2004), and presented in Algorithm 2, which combines bootstrap resampling with automated methods of variables selection.
Algorithm 2: Models selection method using bootstrap.
(a) Consider the matrix [Y,X] formed with the original data; (b) get B resamples of the previous matrix using the paired bootstrap method; (c) for each resample adjust one model and apply the backward method via AIC; (d) for each variable determine how often it was selected in the B models and the percentage of times in which estimated parameters presented positive and negative signs; (e) use the results of the previous step to determine candidate models and select the best model.

Global influence diagnostics using bootstrap in the response variable
In order to investigate the existence of influential points it was held the method proposed by Martin & Roberts (2010), bearing in mind the Cook's distance -D i (Cook, 1977) as a measure of influence. Algorithm 3 shows the method proposed by Martin & Roberts, which is based on JaB (jackknife-after-bootstrap) technique developed by Efron (1992).
Algorithm 3: Determining cutting point D i using JaB. (a) Adjust the proposed model to the original dataset and estimate D i , i = 1,…,n; (b) build B bootstrap samples using paired bootstrap method; (c) (JaB step) for each x i sample of the original dataset consider the bootstrap samples set which do not contain the x i red latosol with clay texture (EMBRAPA, 2013). Given a set of 30 Prod (soybean yield, t/ha) points uncorrelated the randomness was confirmed by the runs-test algorithm for randomness (Siegel, 1956). The respective values of the explanatory variables SRP 1 , SRP 2 and SRP 3 (soil penetration resistances, MPa, from 0 to 0.1 m, 0.1 to 0.2 m and 0.2 to 0.3 m depths, respectively), Ca (calcium, cmol c /dm 3 ), Mg (magnesium, cmol c /dm 3 ), K (potassium, mg/dm 3 ), P (phosphorus, mg/dm 3 ), Mn (manganese, mg/dm 3 ), Des 1 , Des 2 and Des 3 (soil densities, g/cm 3 , from 0 to 0.1 m, 0.1 to 0.2 m and 0.2 to 0.3 m depths, respectively) have all been considered for each productivity value. The use of physical and chemical soil properties as explanatory variables is common practice in field surveys, as variations in soil properties account for most of crop yield variations, according to Khakural et al. (1999).

Exploratory analysis and modeling
Descriptive statistics of the variables under study were calculated and a multicollinearity 2 analysis of the explanatory variables was performed. A multiple linear regression model was built to describe the relationship between soybean yield and soil properties, with parameters estimated by the ordinary least squares (OLS) method.

Paired bootstrap
To determine the bootstrap replicates of the parameters of the regression model we used the paired bootstrap method (Freedman, 1981), presented in the following algorithm.

Confidence intervals using bootstrap
In order to determine the bootstrap intervals for the parameters of the regression models we used the per-4

Computing resources
The analyses carried out in this work were developed in R statistical software (R Core Team, 2014). The bootstrap replicates used to determine the empirical distributions of model parameters were determined by the function lm.boot of package simpleboot (Peng, 2008), and confidence intervals were implemented manually. In order to determine the statistics related to the model selection method, function boot.stepAIC of package bootStepAIC was used (Rizopoulos, 2009). The algorithm utilized to determine the cutoff point for Cook's distance bootstrap was implemented with Cook's distance calculated by function cooks.distance of package stats, and JaB graphs were implemented by the authors.

Results
Descriptive statistics of the explanatory variables indicated homogeneous behavior of the variables, with no multicollinearity found. The multiple linear regression model of soybean yield, estimated through OLS considering all explanatory variables (Eq. [3]), showed an adjusted coefficient of determination (R 2 Adj ) of 0.41 and root mean square error (RMSE) of 0.33. Prod = 8.858 -0.271SRP 1 + 0.117SRP 2 -0.003SRP 3 + 0.288Ca -0.367Mg + 1.208K -0.067P -0.012Mn -0.629Des 1 -2.684Des 2 + 0.925Des 3 . [3] It could be observed that estimates for those parameters associated with SRP 1 , SRP 3 , Des 1 and Des 2 variables showed negative signs, indicating that an increase in the value of these variables implies a reduction in soybean yield (Eq. [3]). The parameters estimation associated to SRP 2 and Des 3 variables from the Eq.
[3] showed different signals from the expected scenario, since it indicates a direct relation from such variables towards soybean productivity (Eq. [3]). The positive estimate signal, from the associated parameter of the variable K, indicates that, while maintaining other variables constant, an increase in one unit in the K variable produces an increase in soybean productivity, at a rate of 1.208 t/ha (Eq. [3]). The bootstrap intervals were determined with reliability of 95% for the parameters of the multiple linear regression model using the techniques of bootstrap percentile by Efron and BC bootstrap (Table 1).
It was observed that the vast majority of the confidence intervals, determined by the bootstrap technique, contained zero indicating, that with exception of the sample (approximately B/e groups 3 ) and for each sample of this group estimate the n values for Cook's distance; group all n·(B/e) values into a single vector; (d) the quantile 2.5% and 97.5% of the distribution generated by n·(B/e) values of Cook's distances are used as cutting points and if the D i value is outside this interval then x i is marked as an influential point.

Jackknife-after-Bootstrap graphic
The JaB technique provides another resource for establishing the effect of individual observations on the bootstrap distribution through development of the JaB plot (Efron, 1992). Based on the original [Y,X] dataset, consider the dataset [Y (i) ,X (i) ] obtained by deleting the i row in the original dataset and calculate the statistic of interest, denoted by s (i) . The jackknife influence function for the statistic of interest is defined by: Intuitively, points with high positive or negative values of u i {s} have a high influence on the calculated statistic. To provide a clearer interpretation, the relative jackknife influence function shown in Eq.
[2] is commonly used, being the number two the value established as the cutoff point (Efron, 1992). These values are ascending ordered and marked on the abscises axis. [2] After calculating the jackknife influence values for each point i, of the dataset, seven ordered pairs are determined, namely (u i ↑ {s}, P k ), k = {5,10,16,50,94,90,95} where P k represents the k-th percentile of the bootstrap distribution formed with bootstrap replicates calculated from those bootstrap samples which do not have point i. For each percentile the neighboring ordered pairs are linked thus forming graphics, which are compared with dashed line segments perpendicular to the ordinate axis in points P k , k = {5,10,16,50,84,90,95}, calculated from full bootstrap distribution formed by 3000 bootstrap replicates. The analysis is performed highlighting those points surpassing the cutoff point and comparing bootstrap distributions. 5 Soybean yield modeling using bootstrap methods for small samples that this variable is not significant, therefore, can be deleted without causing damage to the modeling. A similar case occurs with SRP 3 variable, as well as being selected in only 460 models, the appropriate sign of its variable P, the other explanatory variables may not be individually significant. In search for a more appropriate multiple linear regression model it was applied the model selection method using bootstrap considering 1000 resamples (Table 2). It was observed that, of the 1000 models for which bootstrap resamples had been adjusted, by applying the backward selection method with statistical Akaike -AIC to each of them, the result showed that in 91% of the models the predictor variable P was selected, indicating that phosphorus is an important soil attribute for soybean yield prediction. Furthermore, it was observed that in 100% of models in which phosphorus had been selected, its estimated parameter was negative, which ensures that when other variables are held constant an increase in phosphorus level implies reduction of soybean yield.
Other variables selected for most models were Des 2 with a selection percentage of 87%, Ca with 81% and SRP 1 with 79%. Analyzing the signs of the estimated parameters associated with these variables in the models in which they were selected it is highlighted that in 94% of models in which the Ca variable was selected the sign of its estimated parameter was positive, suggesting the increase in value of this variable contributes for increasing soybean yield. For those estimated parameters associated with SRP 1 and Des 2 variables, in 98% of models in which they were selected the signals were negative. It is clear that some variables may not be useful to explain soybean yield behavior. For example, among the 1000 models obtained, the Des 1 variable was selected in only 500 and additionally for 180 of those the estimated parameter sign was positive and for 320 of those the sign was negative, thus, this set of oscillations is a guarantee Amplitude: θ u -θ l ; θ l : lower limit; θ u : upper limit; [3] BC: bias corrected. = 0.41) model provided an equivalent level of explanation, however, these models had a higher RMSE compared to the complete model (RMSE = 0.33) and that difference is most evident in the M 79 model (RMSE = 0.39). As the M 71 model explained 49% of the soybean yield variation and RMSE of this model (RMSE = 0.34) is close to RMSE of the complete model (RMSE = 0.33) the M 71 model was chosen as best adjusted model to soybean yield and analysis was performed using JaB to investigate the existence of influential points.
It is noteworthy to mention no points were detected as influential when value 1 is established as the cutoff point (Fig. 2). The same is true when considering the criteria that detects point i as influent if D i is higher than the median of the distribution F of Snedecor with free-estimated parameter cannot be identified considering that 230 models had a positive sign and 230 had a negative sign. As per the parameters estimates associated to SRP 2 and Des 3 variables, it showed opposite signals from the expected scenario (Eq. [3]), it is desirable to verify the importance of such variables for modeling purposes. Although the positive signals from the associated estimated parameters to such variables appear in a great part of the models (80% and 78%, respectively), the selection percentages were not very elevated (61% and 42%, respectively) and, therefore, there were evidences that they were not significant and could be removed from modeling (Table 2). In view of these observations four models were set to be analyzed, namely M 81 , M 79 , M 75 and M 71 (Table 3), and each of them was determined according to a number of explanatory variables selected in accordance with how many times they had been selected in the bootstrap models.
Regressors present in the M 81 model can explain only 37% of the soybean yield variation, a result lower than   Soybean yield modeling using bootstrap methods for small samples detected as being influential through analysis using Cook's distance (D i ) with JaB methodology. JaB graphs were created to help identify influential points, they give a visual interpretation of how a particular point affects the bootstrap distribution for the estimation of parameters in M 71 (Fig. 3). Observing the graphs in Fig. 3, it was noted that points 10, 15, 23 and 29 were detected as influent.
dom degrees of p = 6 and n -p = 24 once the cutoff point is 2.50 to these, thus they were also not detected as influential points. Considering 4/n ≈ 0.13 as cutoff point, the points 15, 23 and 29 were detected as influential indicating these points can change the estimation of the parameters in the regression model, so it is important to investigate the model behavior without the use of these points. It should be emphasized that only point 23 was

Discussion
The average soybean productivity in the monitored area (4.305 t/ha) is considered high compared with other regions, according to data from CONAB (2015) in the agricultural year 2013/2014 average productivity in Brazil was 2.854 t/ha and in Paraná was 2.950 t/ha.
The negative sign of estimates for parameters associated with SRP 1 , SRP 3 , Des 1 and Des 2 variables (Eq. [3]) are expected once soil density (Des) shows a direct relationship with SRP (Busscher et al., 1997), and as SRP has great influence on plant growth, root growth Two new models were adjusted to the variables P, Des 2 , Ca, SRP 1 , Mn, Mg as to measure the effect of influential points in modeling. The M 71-{15,23,29} model was adjusted to the data set without points (15,23,29), as these were detected as influential by traditional Cook distance method with cutoff point of 4/n. The M 71-{10,15,23,29} was also adjusted to the data set without points (10,15,23,29) for these were considered as influential by analysis using JaB (Table 4).
The M 71-{15,23,29} model adjusted to data set without sample elements 15, 23 and 29, which were identified as influential by the traditional method, is more explicative than M 71 model attained from the complete set of points for after removal of these points the percentage of soybean yield variation that can be explained by the regressors increased from 49% to 63% (Table 4) (Table 4) it was emphasized that the identification of point 10 as influential and its withdrawal from the data set as a result corroborated reduction of this statistic, thus resulting in a more accurate model for making predictions. In view of these results  [1] β l : parameter associated with the variable i = {P, Des 2 , Ca, SRP 1 , Mn, Mg}; P: phosphorus, mg/dm 3 ; Des 2 : soil density, g/cm 3 , from 0.1 to 0.2 m depth; Ca: calcium, cmol c /dm 3 ; SRP 1 : soil penetration resistance, MPa, from 0 to 0.1 m depth; Mn: manganese, mg/dm 3 ; Mg: magnesium, cmol c /dm 3 ; [2] Amplitude: θ u -θ l ; θ l : lower limit; θ u :upper limit. [3] BC: bias corrected.

9
Soybean yield modeling using bootstrap methods for small samples model selected by the Akaike criterion is not superparameterized, which can occur when the amount of samples is small. Analyzing Fig. 3a, the graph of the bootstrap distribution of the parameter estimates associated with the variable SRP 1 , it is seen that points 15 and 10 are detected as influential. Point 15 has a negative influence (-3.7) and its removal reduces bootstrap distribution amplitude, a fact that occurs mainly due to a shift in the initial percentiles if one considers the empirical distribution formed with 3000 replicates, P 5 = -0.373, P 10 = -0.330, P 16 = -0.302 and considers the empirical distribution formed only by bootstrap replicates with bootstrap samples not containing point 15 (1124 samples), P 5 = -0.336, P 10 = -0.295, P 16 = -0.270. The influence of point 10 is positive (2.6). It is observed that when considering the bootstrap distribution formed with those bootstrap samples that do not contain point 10 (1039 samples) the values considered percentile decrease, causing distribution displacement and reduction of its range from 0.865 to 0.727.
JaB graph in Fig. 3b for Ca variable indicated point 23 as negative influence (-2.8) and point 15 as positive influence (3.5), thus withdrawal of these points also causes changes in the empirical distribution of bootstrap estimates. After disregarding bootstrap replicates obtained from bootstrap samples which had point 23, the initial percentiles increased and the distribution range went from 1.214 to 0.999; and after disregarding those replicas obtained from samples containing point 15, there was a reduction in values of final percentiles, which also reduces the amplitude of the empirical distribution. Analyses of other graphs (Figs. 3c through 3f) are similar and indicate point 15 has a negative influence on bootstrap distributions of the parameters associated with the Mg, Mn and Des 2 variables; it also indicates point 23 has a positive influence on bootstrap distributions of the parameters associated with Mg and Mn variables and a negative influence on bootstrap distribution of the parameter associated with variable P. Point 29 has a positive influence on bootstrap distribution of the parameter associated with variable P and point 10 has a positive influence on bootstrap distribution of the parameter associated with variable Des 2 .
Comparing all the points that were detected as influential in JaB graphs (Fig. 3), it is clear the sampling member 15 stands out due to its influence on most bootstrap distributions of estimated parameters, only the distribution of the parameter associated with the variable P is not influenced by excluding this element. The sample elements 23 and 10 also stood out as influential on various confidence intervals and the sample element 29 is the least influential of the four. By taking sample element 29 out of the empirical distribu-and crop yields vary inversely proportional to its value (Freddi et al., 2006). The parameters estimation associated to SRP 2 and Des 3 variables show opposite signals from the expected scenario; however, since it is verified that multicollinearity was non-existent, it is also prudent to investigate the significance of such variables. The positive estimate signal from the associated parameter to K variable is expected, once and in accordance with Pettigrew (2008), potassium is one of major nutrients considered essential for crop growth and yield development.
The comparison of confidence intervals can be done in terms of their amplitudes according to Paes (1998), to whom a high amplitude interval indicates reduced accuracy of estimation as compared with a range of lower amplitude, thus comparing the two techniques bootstrap confidence intervals (Table 1) it is clear the intervals obtained by Efron percentile technique showed lower amplitude and therefore is the most accurate. Given the fact that zero is present in most of the confidence intervals (Table 1) it is prudent to investigate whether there are irrelevant variables and/or influential points in the data set for they cause an increase in the parameter variance (Rao, 1971;Meloun & Militký, 2001) and as a result confidence intervals tend to have a greater range and loss of accuracy.
The fact that predictor variable P is selected in a large share of models (Table 2) and that its signs of the estimated parameters are negative in all of them can be explain by the high phosphorus values found (on average 12 mg/dm 3 ) which according to Popp et al. (2002), may indirectly decrease yields due to micronutrients imbalance. The high percentage (94%) of times when the sign of the estimated parameter associated with variable Ca is positive is also expected, because calcium deficiency is among the main factors that inhibit root growth as reported by Oliveira et al. (2009), especially in latosols. Such a deficiency would have the plant vulnerable to biotic, biological and nutritional stresses and consequently would lead to reduced productivity (Dourado Neto et al., 2014). As SRP 1 and Des 2 variables are used to assess the state of soil compaction, their effect on soybean yield is the opposite, for plants exhibit alterations in depth, branch and distribution of roots in response to soil compaction (Rosolem et al., 2002), which undermines the efficient use of nutrients and water and limits crop yield (Alakukku & Elomen, 1995).
The model selection method using bootstrap is effective in determining the significant variables resulting in a more parsimonious model. Although the model determined by this method (M 71 ) has been the same selected by the conventional method using Akaike, the application of this methodology serve to attest the 10 ical variables in this essay to be due to the fact of the limitation on the spatial representation of the results to be obtained from the collected data in weather stations. (Junges & Fontana, 2011).
The results showed that the bootstrap methods enabled us to select the physical and chemical soil properties, which were significant in the construction of the soybean yield regression model, construct the confidence intervals of the parameters and identify the points that had great influence on the estimated parameters.
tion of the bootstrap replicates obtained from bootstrap samples only the bootstrap distribution of parameter estimates associated with the variable P is influenced, seeing its range reduced from 0.163 to 0.112.
Regarding diagnostic analysis it is seen that the influential points determination method using JaB methodology together with Cook's distance (Fig. 2) do not identify some points clearly highlighted as influential by traditional methods and JaB graphics. Thus, JaB graphics prove to be a great alternative to identify influential points, as well as identifying the influential points with greater accuracy compared to traditional analysis they also provide information on bootstrap distributions of parameter estimates, making it possible to see what happens to confidence intervals when the influential samples are excluded.
Regarding the significance of the explanatory variables in M 71-{10,15,23,29} model it is observed that only the parameter associated with Mn variable showed confidence intervals containing zero in both bootstrap techniques (Table 5) giving signs it may be irrelevant. This suspicion can be ruled out once there is evidence that the parameter signal associated with this variable is negative. It is necessary to simply notice that the zero has appeared at the high end of intervals and to also notice that in the variable selection method (Table 2) variable Mn is selected in 75% of models (750 models) and has a negative sign in 94% of them (705 models). By comparing techniques of confidence interval determination used it is observed they presented a similar behavior, though Efron percentile method stands out for providing intervals of lower amplitude. Cunha & Colosimo (2003) also highlights Efron percentile method to determine confidence intervals for regression models with measurement errors, since according to the authors this method is evidenced by its greater simplicity with equal performance compared to the others.
It is noteworthy that the bootstrap methods are fundamental to obtaining a more descriptive and more accurate model, as aside from the model M 71-{10,15,23,29} to furnish a higher percentage of explanation of the soybean productivity (65%) than the initial model in Eq.
[3] (41%), it furnishes a lower RMSE, being more accurate. It is important to highlight the explanatory influence of the model M 71-{10,15,23,29} to be under satisfactory terms taking into account to be built only by physical and chemical features of the soil. The soybean productivity percentage variation not covered by such model (35%) is due to variables not considered, for example, the agricultural-meteorological, since climate has a significant impact upon the growth and development of crops (Hoogenboom, 2000). It is valuable to note the non-inclusion of the agricultural-meteorolog-