Artificial neural networks in the prediction of soil chemical attributes using apparent electrical conductivity

Aim of study: To use artificial neural networks (ANN) to predict the values and spatial distribution of soil chemical attributes from appa rent soil electrical conductivity (ECa) and soil clay contents. Area of study: The study was carried out in an area of 1.2-ha cultivated with cocoa, located in the state of Bahia, Brazil. Material and methods: Data collections were performed on a sampling grid containing 120 points. Soil samples were collected to determine the attributes: clay, silt, sand, P, K + , Ca 2+ , Mg 2+ , S, pH, H + Al, SB, CTC, V, OM and P-rem. ECa was measured using the electrical resistivity method in three different periods related to soil sampling: 60 days before (60ECa), 30 days before (30ECa) and when collecting soil samples (0ECa). For the prediction of chemical and physical-chemical attributes of the soil, models based on ANN were used. As input variables, the ECa and the clay contents were used. The quality of ANN predictions was determined using different statistical indicators. Thematic maps were constructed for the attributes determined in the laboratory and those predicted by the ANNs and the values were grouped using the fuzzy k-means algorithm. The agreement between classes was performed using the kappa coefficient. Main results: Only P and K + attributes correlated with all ANN input variables. ECa and clay contents in the soil proved to be good variables for predicting soil attributes. Research highlights: The best results in the prediction process of the P and K + attributes were obtained with the combination of ECa and the clay content. ANN (artificial neural networks); CEC (cation exchange capacity); CV (coefficient of variation); ECa (apparent soil electrical conductivity); MPE (modified partition entropy); PA (precision agriculture); RME (mean relative error); SB (sum of exchangeable bases). and designed the study, managed the of the and performed the data analysis. and performed the evaluations of the parameters analyzed in the study.


Introduction
The soil is a dynamic and highly variable system and its surface layers are more delicate and dynamic when compared to the rest of its matrix (Daniel et al., 2003). Soil properties, like any dynamic and complex process, present a wide spatio-temporal variation (Santos et al., 2017), which requires special attention in the adoption of management practices aimed at reaching the productive potentials of agricultural areas.
In precision agriculture (PA), the description of the variables that characterize soil fertility must be carried out from different attributes and, mainly, from a high number of samples for the same attribute (Silva & Lima, 2012). The use of a large sample density puts a burden on the PA system, making it unfeasible in some cases depending on the availability of financial resources to be contributed at this stage, especially for those attributes whose determination requires complex laboratory analyses.
The use of sensors has grown in application and importance as it allows cost reduction in PA systems and for reducing the time required for decision-making. In some cases, the information from the sensing is available in real time and or in a short time (Medauar et al., 2020). Among the sensors used in PA, the ones used to measure the apparent electrical conductivity (ECa) of the soil stand out, whose spatial variability is highly correlated with that of different soil attributes (Grubbs et al., 2019).
ECa is a minimally invasive property capable of describing the spatial distribution patterns of different soil properties important for the management of agricultural cultivation (Serrano et al., 2017;Grubbs et al., 2019). Because it is capable of measuring the capacity of the soil to conduct electricity through solid particles and exchangeable cations at the solid-liquid interface of clay minerals, in addition to the soil solution (Corwin & Lesch, 2003;Stadler et al., 2015), it can be used to predict the values and distribution of different soil attributes, guiding decision-making for supplementary fertilization.
The use of methodologies for predicting variables is a common practice in several areas of knowledge and has grown in recent years for the most different applications. Among the prediction methods, artificial neural networks (ANN) have been used by several authors to predict soil attributes from different variables (Daniel et al., 2003;Guo et al., 2013;Kolassa et al., 2018;Ng et al., 2019). Jafarzadeh et al. (2016), evaluating different methods for the prediction of the cation exchange capacity (CEC) in an agricultural soil and Aitkenhead & Coull (2016), predicting the carbon stock in the soils of Scotland, concluded that the ANN present low deviations in the prediction process of the attributes of the soil and faithfully represent the phenomena under study.
Despite the potential of ECa to describe physical and chemical attributes of the soil, few studies have investigated the potential of this variable for the prediction of attributes that describe its fertility, especially using computational intelligence models. ECa has been widely used in PA, mainly for the description of the spatial variability of soil attributes and or for the design of management zones for different agricultural crops. Uribeetxebarria et al. (2018) used ECa and multivariate analysis of variance as diagnostic tools for the physical-chemical composition of soils and the variability of attributes in areas of fruit growing. Sanches et al. (2018) described the soil pH and built, through spatial analysis, models for the recommendation of lime, using ECa as a predictor variable. Bottega et al. (2017) outlined management zones for soybean culture using ECa as an input variable, associated or not with soil texture.
The association between soil sensors and models based on computational intelligence such as ANN, is an important tool for predicting soil attributes that control the fertility of agricultural areas, reducing costs in PA without losing precision in decision-making on management practices. Given this context, the aim of this work was to use ANNs to predict the values and spatial distribution of chemical attributes of the soil from the ECa and soil clay contents.

Material and methods
The study was carried out in an area of approximately 1.2 ha located in the southern region of the state of Bahia (Brazil), under the central coordinates of 14°47' S and 39°16' W. The cocoa tree (Theobroma cacao L.) is being cultivated in the 3.0 × 1.5 m spacing and Erythrina sp. in the 24 × 24 m spacing.
The region's climate is classified as Af-type, tropical-humid, with an average annual precipitation of 1830 mm, relative air humidity of around 80% and the average annual temperature ranging between 21.5 and 25.5°C (Köppen & Geiger, 1928). The soil is classified, according to the Brazilian Soil Classification System, as Eutroferric Haplic Nitisol (Santos et al., 2017).
For data collection in the area, an irregular sampling grid was constructed, totaling 120 sample points spaced 9.5 m on the x axis and 6.6 m on the y axis. The sampling point was composed of a cocoa plant and the coordinates were defined using a local referencing system (local coordinates), fixing a point and defining the position of the others through the distance in relation to it in a Cartesian plane.
Soil ECa was measured and soil samples were collected in the projection of the cocoa canopy at each of the 120 sampling points in the grid. The projection of the crown was considered, an area delimited by a radius of 0.40 m from the stem of the cacao trees.
The soil samples were collected with the aid of a Dutch type auger, in the 0-0.20 m layer, using 4 sub-samples (01 sample per quadrant) to compose a composite sample. The soil of the composite sample was homogenized in a plastic bucket, from which approx. 0.5 kg was removed and packed in plastic bags. The soil samples were air dried, removed with the aid of a roller and passed through a 2 mm mesh sieve to obtain air-dried fine soil. Afterwards, the samples were sent to a commercial laboratory to determine some physical, physical-chemical and chemical attributes of the soil.
The measurement of the ECa was performed using the electrical resistivity method, using the Wenner Matrix (Corwin & Hedrickx, 2002;Corwin & Lesh, 2003). This method is based on the introduction of four electrodes equally spaced on the soil surface. An electric current is applied between the external electrodes and the potential difference is measured at the internal electrodes. For the measurement, a portable conductivity meter ERM-02 manufactured by Landviser with electrodes spaced at 0.20 m was used, representing a measurement at an equal depth.

Precision agriculture in cacao
The physical attributes analyzed were the granulometric fractions of the soil, clay, silt and total sand. The following chemical and physicochemical attributes of the soil were also analyzed: P, K + , Ca 2+ , Mg 2+ , S, active acidity (pH in water), potential acidity (H + Al), sum of exchangeable bases (SB), CEC, base saturation (V), organic matter (OM), and remaining P (P-rem). The methods of laboratory analysis, as well as the extractors for the determination of soil attributes, were used as recommended by EMBRAPA (2017).
Soil samples were collected in December, immediately after the completion of cocoa harvest referring to crop production (higher cocoa production throughout the year). The ECa was measured at three distant times in relation to the soil sampling period, namely: a) sixty days before soil sampling -60ECa; b) thirty days before soil sampling -30ECa, and; c) on the same day of soil sampling -0ECa.
At each moment of measurement of the ECa, 30 soil samples were collected to determine the gravimetric soil water content. The samples were packed in plastic and dry packages, weighed and dried in a forced-air oven at a temperature of 105ºC until they reached a constant mass, and the soil water content was determined.
Soil and ECa attribute data measured in different periods were submitted to exploratory statistical analyses to verify the presence of discrepant values. The interquartile range was used for this evaluation and, when values that deviated from the frequency distribution of the data were observed, they were removed. The values of ECa and granulometric fractions were also submitted to descriptive statistical analysis to determine the measurements of position, dispersion and shape of dispersion. Normality was tested using the Kolmogorov-Smirnov test at the level of 5% probability.
After the removal of the outliers, the data were normalized to mean zero and standard deviation equal to one (Leal et al., 2015). This normalization was performed to equalize the scales, since the absolute differences between the scales of the variables could compromise the analyses used to achieve the proposed objectives.
The normalized results of ECa and soil water contents were correlated using Pearson's linear model at the 5% probability level. The absolute values and, mainly, the spatial distribution of ECa are influenced by soil water content, and this relationship should be considered in studies that intend to use conductivity to predict soil fertility (Brevik et al., 2006).
To evaluate the relationship between the different sets of variables, a correlation analysis was performed between the normalized values of ECa and textural fractions with the chemical and physicochemical attributes of the soil. Pearson's linear correlation was used at the 5% probability level. This analysis was performed with the main intention of selecting the most appropriate variables for the prediction models, and those that correlated with the ECa and the clay content of the soil were selected simultaneously.
For the prediction of chemical and physicochemical attributes of the soil, models based on ANN of the perceptron type were used. Perceptron networks are simple neural networks that comprise the existence of an input and output layer (target), being assigned weights to each input and the output values obtained as a sum of the input products by their respective weights . The back-propagation learning rule was used, where iteratively seeks the minimum variance between the expected outputs and those predicted by the neural network (Haykin, 1999).
ECa measured values were used as input variables in the different periods (60, 30 and 0 days before soil sampling) and the clay contents. The choice of clay content as input variable is due to the fact that this mineral is directly related to the availability of all attributes that account for fertility, offering chemical stabilization of the soil. These variables were used individually and in association, as described in Table 1.
The output variables (target) were selected from the results of Pearson's linear correlation analysis between the input variables (considered individually) and the normalized values of the chemical and physicochemical attributes of the soil. Only those that correlated significantly with all input variables were used in ANN analysis as output variables.
After predicting the values of chemical and physicochemical attributes by the ANNs they were "denormalized" to assume again their absolute values, respecting the unit of measurement of each output variable. The results were submitted to descriptive statistical analysis to evaluate the measures of position, dispersion and shape of dispersion of the real variables and those predicted by the ANNs. Data normality was tested by the Kolmogorov-Smirnov test at the 5% probability level.
For the construction of thematic maps for the chemical and physicochemical attributes of the soil determined in laboratory analyses and those predicted by the ANNs, the values for each sampling point were submitted to geostatistical analysis, in order to verify the existence and, in this case, to quantify the degree of spatial dependence, from the adjustment of theoretical functions to the models of classic Matheron's experimental variograms, based on the assumption of stationarity of the intrinsic hypothesis.
Once spatial dependence was proven, the data were interpolated to estimate values for non-sampled locations. Interpolations were performed using the geostatistical method of ordinary kriging. The values were interpolated in order to generate matrices of the same order for all variables, where each term independent of each matrix represented one pixel of a map.
To evaluate the efficiency of the estimates obtained from the values of the data determined in laboratory analyses and those predicted by the ANNs, some statistical indicators of efficiency were calculated: -Mean relative error (RME, %): this indicator evaluates the relationship between the value measured in laboratory analysis for the x i position and the value predicted by ANN in the same x i position and is determined according to equation: in which y = value measured in laboratory analysis; ŷ = value predicted by ANN, and n = number of observations. -Willmott's index of agreement (d): this index relates the difference between the value measured in laboratory analysis at the x i position in relation to the value predicted by ANN in the same x i position and it is determined according to equation: ( 1 in which y = value measured in laboratory analysis; ŷ = value predicted by ANN; ӯ = mean of the measured values in laboratory analysis, and n = number of observations. The index d varies between 0 and 1, and the greater the better the agreement between the measured and the predicted (Willmott et al., 1985).
-Linear correlation coefficient (r): Pearson's linear correlation was used at the level of 5% probability among the values measured in laboratory analysis and the values predicted by ANN. The coefficient varies between -1 and 1 and the higher the absolute value, the greater the relationship between the databases.

PI = r · d
To evaluate the agreement between the maps for the data measured in laboratory analysis and those of the data predicted by ANN, the interpolated values were submitted to a cluster analysis using the fuzzy c-means hierarchical model, which uses the Euclidean distance to calculate the proximity between the samples. This method is based on the minimization of equation, according to Guastaferro et al. (2010): where N is the number of data; m is the fuzzy weighting exponent; k is the number of classes; and d ij is the squared Euclidean distance between the x i sample points and the centroid of class Cj. The fuzzy uij element is conditioned to the restrictions for all i=1 to N and all j=1 to k, according to equation: The numbers of classes were tested and defined for the groupings for interpolated values from the data measured in laboratory analysis. After choosing the best number of classes, these were replicated to all values predicted by the ANNs.
Groupings with two, three, four and five classes were tested for interpolated values from the data measured in The FPI index describes the shared association between any pair of fuzzy sets (FPI=1 corresponds to maximum inaccuracy and FPI=0 means non-inaccuracy). The MPE describes the certainty (or uncertainty) of fuzzy k-partitions (MPE=1 corresponds to maximum uncertainty and MPE=0 maximum certainty). The ideal number of management zones is obtained when both indices are minimal.
To analyze the agreement among the maps classified for the data measured in laboratory analyses and those predicted by the ANNs, cross-validation was performed two by two (observed × predicted) between each map grouped with the same number of classes. The agreement was analyzed through the kappa coefficient according to Kitchen et al. (2005) and Valckx et al. (2009). The kappa coefficient indicates the superiority of the reclassification over a random classification and presents the agreement between clusters and, for interpolated data, between spatial distributions.

Results and discussion
The mean values for ECa (Table 2) were close for the three evaluation periods, with an average amplitude of 0.64 Ms m -1 . The lowest mean value was observed for the ECa measured on the same day of soil sample collection (0ECa) and the highest value for ECa measured 60 days before (60ECa).
For the granulometric fractions, the highest mean values are observed for the total sand. The clay contents presented the lowest mean values, with an expressive mean for silt contents (38.73 %). Based on the averages of the granulometric fractions, the area's soil can be classified as having a frank texture, according to the textural classification proposed by EMBRAPA (2018). The high silt contents are usually associated with the lower degree of pedogenetic development of soils (Grego et al., 2011) and, consequently, with the degree of weathering of the soil.
Generally speaking, the ECa averages observed for the different evaluated periods are within the expected values, due to the textural composition of the soil. The highest values of ECa are generally found in soils with higher clay contents, with a high and positive correlation between these two variables (Moral et al., 2010). In soils with higher sand content, the ECa values tend to be lower, since the electrical resistivity decreases as the availability of electrical loads in the soil physical matrix is reduced.
Similar behavior to that of the mean values was observed for the variability of the ECa measured by the CV. The values for this statistic were medium to low, indicating reduced variation of the values around the mean. For soil texture, however, CV values were very different among different fractions. The silt presented the lowest variation, while the clay content the highest.
The results of the linear correlation analysis between ECa and soil water content in the different evaluation periods are presented in Table 3. The mean values of soil water content were very close for the evaluated periods. The same behavior is observed for CV values, which, in addition to being close, are low. Table 3 shows the absence of a linear correlation between ECa and soil water content. These results, added to the low variation of the  water content, indicate that the variation of the ECa is not influenced by the variation of the water content in the soil, and there is no cause and effect relationship between these variables for the study. A low variability for soil water content values at the time of ECa measurement points to a greater correlation of this variable with soil attributes (Lück et al., 2009). For the correlation between ECa, soil chemical, physical and physicochemical attributes (Table 4), the values for Pearson's coefficient were significant only for P, K + , Mg 2+ and silt. The correlations were positive and very similar between periods for P, K + and silt, while for Mg 2+ the correlation was negative. ECa is an indirect measure capable of describing the physical and chemical condition of soils, mainly because it correlates with the attributes that determine these conditions (Moral & Serrano, 2019). Several authors have studied these correlations and verified, in different soil types, the possibility of using ECa to infer about the variability of soil properties (Serrano et al., 2017;Grubbs et al., 2019;Sanches et al., 2019).
In a study conducted in different topographic positions, Singh et al. (2016) found a linear correlation between significant and positive between ECa and exchangeable levels of P and Mg, however, there was no correlation with K. In a study in the sugarcane area, Sanches et al. (2019) observed correlation of ECa with the exchangeable levels of K. These authors, also like in this study, did not find a significant correlation between ECa and all soil chemical attributes.  [1] P-rem: remaining P; OM: organic matter; SB: sum of exchangeable bases; CEC: cation exchange capacity; V: base saturation; 0ECa: measurement on the same day as the soil sample collection; 30ECa: measurement 30 days before soil sample collection; 60ECa: measurement 60 days before soil sample collection. * significant at the level of 5% probability. Table 4. Linear correlation between soil attributes and apparent electrical conductivity measured in different periods.

Precision agriculture in cacao
ECa is mainly controlled by ions close to soil constituents, therefore, correlation values with the properties of this physical environment are generally observed (Moral et al., 2010). In this sense, ECa can be used for targeted soil sampling as a secondary variable to estimate the spatial distribution of a main variable (Fortes et al., 2015) and to predict soil attributes (Grubbs et al., 2019). Despite these possibilities, it is worth mentioning that the natural variability of soils interferes in the results of the relationships between ECa and soil attributes, requiring the evaluation, for each condition, of the practical applications of ECa.
The granulometric fractions, unlike the ECa, correlated with a high number of chemical and physicochemical attributes of the soil. Generally speaking, the correlation values were positive for the total sand ratio and the attributes and negatives for the clay and silt relationships with soil attributes. Terrón et al. (2011), mapping the ECa in an agricultural soil, observed a strong correlation between ECa and P and K + contents. These authors, however, warn that, because it is influenced by several factors, the interpretation of the relationships between ECa and soil attributes should be performed for each production field, seeking those attributes that most influence it.
Correlations between physical and chemical soil attributes are common and expected, given that fertility variation is a product of soil physical properties and these, in turn, come from pedogenetic processes and interference in the cation structure of soils (Silva SA et al., 2010a).
Analyzing the correlation of ECa measured in different periods, the values were high and significant for all combinations, indicating that the behavior of this variable was stable over time, despite differences in absolute values. Temporal stability is an important factor in describing the distribution trend of a variable (Blackmore et al., 2003), allowing, in the last analysis, to predict spatial patterns to be used for decision making regarding agronomic management practices.
In order to use the values of ECa and clay content in the soil together to predict soil attributes, only those that significantly correlated with both variables and periods were considered. From the results in Table 4, only the exchangeable attributes P and K showed significant correlation with ECa (in the three study periods) and with the clay contents, with prediction models using ANN only for these.
The real data of P and K + (determined in laboratory analyzes) and those predicted by the ANNs using the ECa measured in different periods and their combination with the clay contents were subjected to descriptive statistical analyzes. The results of this analysis are presented in Table 5.
The mean values for the attributes determined in the laboratory (real) and those predicted by the ANNs are very close, with variations less than one unit for all combinations of input variables. Similar behavior was observed for the median and the maximum distribution values.
For the minimum distribution values, no similarity was observed for all prediction scenarios in relation to the actual data, with emphasis on those predicted by 0ECa, 30ECa, 30ECaCLAY, 60ECa and 60ECaCLAY for phosphorus and 0ECa, 30ECa and 60ECa for potassium. These combinations, for not being able to reproduce the amplitude of variation of the soil attributes, were the ones that presented greater alterations in the variation coefficient and, consequently, in the values of asymmetry, kurtosis and the Kolmogov-Smirnov test statistics.
In order to assess the effect of the prediction efficiency by artificial neural networks, the results, pixel-by-pixel were related to the values determined in the laboratory (Table 6). Comparatively analyzing the quality of the products predicted by the ANNs, it is observed that the results for K + are closer to the real ones when compared to P. In all combinations for P prediction the RME values were higher than 16%, while the values for K + , only 0ECa, 30ECa and 60ECa presented RME greater than 10%. The same behavior was observed for the other indexes, with few exceptions.
Analyzing the efficiency indicators for the different combinations of inputs, for the prediction of P the lowest value of RME (16.40) was observed for the combination 0_60ECa while the highest values of d (0.96), r (0.92) and PI (0.89) were observed in the combination 0_30_60ECa-CLAY. For K + , the best values for all indices were observed in the combination 0_30_60ECaCLAY with small errors and high-performance indices.
The Willmott index values presented medium to high concordances for all combinations for all the studied attributes. The lowest value was 0.54 for the prediction of K + from the conductivities assessed individually in each measurement period. For all other combinations and periods, d values were greater than 0.65, with 50% of the combinations with values above 0.9 for both attributes.
For the correlation coefficients, more than 60% of the combinations for the prediction of P had values above 0.80. For K + more than 57% of the combinations presented values above 0.80. In general, for all scenarios the correlation values were high (r ≥ 0.54). The correlation between real and predicted values by neural networks tend to be higher for sets of variables that are correlated with each other (Braga et al., 2011).
For all attributes, the combinations that involved ECa and clay contents showed better results. For K + , however, the effect of adding clay as an input resulted in greater gains for the performance indexes when compared to the gains for the prediction of P. These results are directly related to the correlation values observed between ECa and clay with soil attributes, where the values for clay were higher when correlated with K + (Table 4).
The use of ECa as the only input variable resulted in lower prediction performance, with worse values for the statistical efficiency indicators. For the prediction of P, however, with the exception of the scenarios using the individualized ECa (0ECa, 30ECa and 60ECa), the use of clay as an auxiliary input in the network architectures, showed close results for all indexes when compared to those without their presence. In this sense, it is possible to affirm that, in order to use ECa as the only input variable for P prediction, it is necessary to carry out more than one reading in time.
In general, the use of the ECa-Clay combination almost always returns more reliable results for the predic-ted attributes, with emphasis on the greater contribution to the architecture of the ANNs for potassium. The clay fraction generally presents a strong correlation with soil attributes, especially P and K + , due to the adsorption and fixation processes, respectively, indicating this attribute as effective for describing the spatial variability of the former (Silva SA et al., 2010a).
There is a tendency to reduce errors and improve other indexes as the number of variables in the input model increases, however, in some cases, the differences are not 0ECa: measurement on the same day as the soil sample collection; 30ECa: measurement 30 days before soil sample collection; 60ECa: measurement 60 days before soil sample collection; CV: coefficient of variation. Sc: assimetric coeficient. Kc: kurtosis coeficiente. K-S: Kolmogorov-Smirnov index. Table 5. Descriptive statistics of phosphorus and potassium attributes determined by laboratory analysis and estimated by artificial neural networks.

Precision agriculture in cacao
large when compared to scenarios with fewer input variables. An example for what is reported is the difference for the indices between the combination 0_30ECaCLAY and 0_30_60ECaCLAY (best results), where the RME values are very close and the other indices were very close for P and equal for K + .
In the maps of Fig. S1 [suppl], the relationships between ECa and Clay with the attributes P and K + are evident. Because it is a flat area, the variability observed for the variables is associated with the process of soil formation and management practices.
The ECa maps were very similar for all periods under study, indicating that the temporal variation of this variable was low, with classes equally distributed over time. The ECa is distributed over the area with values of 2.4 to greater than 5.7 mS m -1 , with emphasis on the central projection of the area where the values are reduced. Medeiros et al. (2018), studying the spatio-temporal behavior of ECa in two fields cultivated with sugar cane, found values for this variable at wider intervals than those observed for this study. In an area cultivated with coffee, Valente et al. (2014) found similar values to those of this work, including for correlation with soil chemical attributes.
As previously discussed, the correlation values between ECa and the levels of P and K + were significant, but average (Table 4), which is evident in the maps in Fig.  S1 [suppl]. With the exception of the northeastern portion of the area, there is a similarity in the distribution of the exchangeable levels of the chemical attributes of the soil and ECa, where there is a reduction in the values in the east-west direction of the area.
For clay contents, the values follow an increasing gradient in the east-west direction of the area, which indicates the inverse behavior indicated by the correlation (Table 4) with the exchangeable levels of P and K + . The values of P and K + increase in the west-east direction of the area, with well-defined and continuous limits.
Almost all of the area has P values in the classes from 5 to 25 mg dm -3 and from 25 to 31 mg.dm -3 . Only a small proportion of the area has values greater than 65 mg dm -3 of P. The values of exchangeable P availability in the study area are within the recommended limits for most agricultural crops, especially for the cocoa cultivation (Chepote et al., 2013). This result can be attributed to the physical and mineralogical characteristics of the soil under study, where, the predominance of low activity clay and the higher concentration of sand and silt, contribute to a greater availability of P in solution (Ernani et al., 2007). K + follows spatial behavior similar to that of P, with higher values in the eastern region of the area and following an increase gradient as it moves to the western region. This similarity in the spatial distribution of P and K + is not always observed and may be related to the activity of clays and the mineralogical composition of the soil. In a mineralogical survey of the soil in the study area, Carvalho Filho et al. (1987) identified the predominance of the minerals: mica, feldspar, amphibole and apatite. Micas and feldspars are minerals that have K, which may partly justify the availability of this nutrient in the soil  solution. In a work to map the chemical attributes of an Oxisol, Silva SA et al. (2010b) observed the opposite behavior of P and K justified by the greater clay activity and the presence of minerals such as kaolinite and iron and aluminum oxides. After the prediction of the exchangeable levels of P and K + by the ANNs, thematic maps were constructed for each prediction scenario, in order to describe their spatial distribution and analyze concordances with the real maps. The results for this analysis are presented on the maps of Figs. S2 and S3 [suppl] for P and K + , respectively.
For the maps in Fig. S2 [suppl], which represent the spatial behavior of P predicted by the ANNs with the different input variables, it is possible to observe that the predictions from the ECa alone were the ones that least describe the spatial variation of P when compared to the map generated at from the values determined in laboratory analysis (Fig. S1 [suppl]). This behavior corroborates what was previously discussed about the low efficiency of using the individualized ECa to predict the availability P.
In the other prediction scenarios, which involve the combination of ECa readings and also the combination of ECa with the clay content, the maps show a visual similarity to that generated from the values determined in laboratory analyzes (Fig. S1 [suppl]), highlighting the ECa_Clay associations. In all these prediction scenarios, there is a representation of the P distribution behavior discussed earlier, that is, the values increase in the west-east direction, with almost the entire area showing P values in the classes from 5 to 25 mg dm -3 and 25 to 31 mg dm -3 .
For the K + spatial distribution maps predicted by the ANNs (Fig. S3 [suppl]), what was previously discussed about the gain when using the clay content as an auxiliary variable in the architecture of the networks is evident. In all scenarios, the best results to represent the spatial behavior of K + (Fig. S1 [suppl]) were obtained when the clay was used together with ECa, while the use of the latter individualized and or associated with readings in different periods, did not produce satisfactory results.
For scenarios where ECa was used individually, only two classes of K + distribution were obtained using the fuzzy K-means algorithm. The other scenarios for the combination of ECa measured in different periods, presented an equal number of classes to the K + map generated with values determined in the laboratory, but without describing the exact extension of each class.
The maps generated using the combination of ECa and clay, in all combinations pointed out the same number of classes, however, the best results were observed for the combinations 0_30ECaCLAY, 0_60ECa-CLAY, 30_60ECaCLAY and 0_30_60ECaCLAY. These scenarios were able to describe the increasing gradient of K + distribution in the east-west direction and the highest concentration of areas with values greater than 31 mg dm -3 . Silva & Lima (2014) comment that the pre-diction of soil attributes allows the reduction of costs in PA systems, making the system economically viable, especially when using information obtained from remote sensing. Despite the advantages of the soil attribute prediction processes, Guo et al. (2013) warn of the need for prediction methods to be able to describe the spatial behavior of the variables to be predicted, with the risk of obtaining products with no practical use in the soil management system.
Despite the description of the spatial behavior of the available levels of P and K + in the soil, as previously discussed, the maps originating from the predicted values using ECa and Clay as input variables for the ANNs, did not maintain the well-defined limits between classes observed in the generated maps from laboratory determinations. This behavior is confirmed in the agreement analysis by the Kappa coefficient (Table 7).
The agreement values defined by the Kappa coefficient varied between 0.20 and 0.66 (from bad to moderate) for exchangeable P levels and between 0.16 and 0.78 (from bad to good) for the levels of K + . The lowest values were observed for the scenarios that consider ECa individually (0ECa, 30ECa and 60ECa). In all combinations for all attributes, the best agreement values were observed for those who use clay in association with ECa.
As previously discussed, there is a tendency for the agreement values to increase with the increase in the number of input variables. For both attributes, the highest values for the Kappa coefficient were observed for the combination 0_30_60ECaCLAY followed by the combination 0_30ECaCLAY. As the Kappa coefficient indicates acceptance between the two classifications (Kitchen et al., 2005) it is correct to say that for these combinations the accuracy in the prediction of the attributes was satisfactory. Guo et al. (2013) obtained good results using neural networks to predict soil organic matter, predicting both the values of the variable and the spatial pattern of its distribution.
From a practical point of view, the use of a single variable and or a single measurement period of ECa would be more interesting for the prediction of chemical attributes of the soil, reducing the work of sampling and data analysis. Despite this practical appeal, when jointly evaluating the results of this work, the need to incorporate auxiliary variables (in the case of clay in question) is evident for the use of ECa as an input in ANN architectures and, equally, the use of more conductivity reading to better describe the spatial behavior of P and K + levels.
Based on the thesis supported in the previous paragraph, an alternative to better operationalize data collections and reduce the number of samples, is the use of variables from the combination 0_30ECaCLAY, given that all efficiency indicators, as well as thematic maps and the Kappa coefficient indicates values very similar to the best scenario (0_30_60ECaCLAY). In this context, it would Precision agriculture in cacao be possible to maintain the best accuracy in prediction by reducing the number of ECa readings over time.
It is worth noting that the results found in this work may not be applicable to all types of agricultural production fields and soils, however, the methodology can be used to build prediction models adapted to each condition. This is possible because the results of this research show the plausibility of using artificial neural networks to predict soil chemical attributes based on information on physical attributes (with low temporal variability) and ECa, which, despite the greater temporal variability when compared to clay, presents low cost of determination.
In summary, in the condition of the present study, for this soil class, the ANN model used was able to accurately predict the P and K + values available in the soil. ECa and soil clay contents proved to be good input variables for predicting soil chemical attributes. The best results in the prediction process of the P and K + attributes were obtained with the combination of ECa and the clay content, and the combinations of two measurement periods of ECa increased the accuracy of the predicted values. The findings of this research point to the feasibility of using these variables and methodology to predictively assess the spatial behavior of chemical attributes of the soil. 0ECa: measurement on the same day as the soil sample collection; 30ECa: measurement 30 days before soil sample collection; 60ECa: measurement 60 days before soil sample collection. 1 bad; 2 reasonable; 3 moderate 4 good. Table 7. Kappa coefficients calculated from the classification of soil attributes determined by laboratory analysis and estimated by artificial neural networks.