Sampling redesign of soil penetration resistance in spatial t-Student models

Aim of study: To reduce the sample size in an agricultural area of 167.35 hectares, cultivated with soybean, to analyze the spatial dependence of soil penetration resistance (SPR) with outliers. Cascavel, Brazil Material and methods: The reduction of sample size was made by the univariate effective sample size ( 𝐸𝐸𝐸𝐸𝐸𝐸 𝑡𝑡 ) methodology, assuming that the t-Student model represents the probability distribution of SPR. Main results: The radius and the intensity of spatial dependence have an inverse relationship with the estimated value of the 𝐸𝐸𝐸𝐸𝐸𝐸 𝑡𝑡 . For the depths of SPR with spatial dependence, the highest estimated value of the 𝐸𝐸𝐸𝐸𝐸𝐸 𝑡𝑡 reduced the sample size by 40%. From the new sample size, the sampling redesign was performed. The accuracy indexes showed differences between the thematic maps with the origi nal and reduced sampling designs. However, the lowest values of the standard error in the parameters of the spatial dependence structure evidenced that the new sampling design was appropriate. Besides, models of semivariance function were efficiently estimated, which allowed identifying the existence of spatial dependence in all depth of SPR. Research highlights: The sample size was reduced by 40%, allowing for lesser financial investments with data collection and laboratory analysis of soil samples in the next mappings in the agricultural area. The spatial t-Student model was able to reduce the influence of outliers in the spatial dependence structure. size of variables with Student’s t-distribution); GPS (global positioning system); OA (overall accuracy); PA (precision agriculture); RNE (relative nugget effect); SD (standard deviation); SE (standard error); SPR (soil penetration resistance); T (Tau concordance index); UTM (Universal Transverse Mercator). Authors’ contributions: All authors: conceptualized the paper, statistical analysis of data, final revision and discussion. LEDC: re viewed the literature and edited the working versions of the manuscript.


Introduction
The Brazilian economy is directly related to agribusiness, and soybean (Glycine max (L.) Merrill) lead this scenario, which figures as the main grain exported by Brazil. Given the economic importance of this commodity, to preserve the productivity and increase it, it is important to know the spatial variability of soybean yield and its relationship with the physical and chemical properties of the soil (Sobjak et al., 2016). From this perspective, precision agriculture (PA) techniques use the knowledge of the spatial variability of grain yield and the physical and chemical properties of the soil, to find the ideal application of the nutrient according to local needs (Molin et al., 2015). The premise of PA is to use localized management of agricultural inputs to increase profits, reduce losses, and preserve the environment (Alamo et al., 2012;Bier & Souza, 2017).
Geostatistics can help PA, as its techniques make it possible to determine the spatial dependence structure and describe the spatial variability of the yield of soybean and the soil attributes (Dalposso et al., 2016(Dalposso et al., , 2018De Bastiani et al., 2017;Schemmer et al., 2017;Fagundes et al., 2018;Grzegozewski et al., 2020). The geostatistical techniques consider the value observed and geographic location of the physical-chemical properties of the soil, considering a sampling of some georeferenced points in the area. Thus, the entire area is characterized by a small representative portion of it (Wang et al., 2013).
Knowing the spatial distribution of soil attributes and agricultural production is possible, even for small farmers. Combining sample planning and spatial statistics techniques, it is possible to characterize the spatial variability of attributes without using equipment with high investment, such as a harvest monitor (Schemberger et al., 2017).
Also, to better understand the nutritional characteristics of the soil, it is important to combine samples of macro-and micro-nutrients and physical attributes, such as soil penetration resistance (SPR), which is related to the analysis of soil compaction. Compacted soils tend to hinder the availability of nutrients and water to the plant, which interferes with the growth of the roots and, consequently, with the development of the plant and the grain, thus affecting productivity (Valadão et al., 2015(Valadão et al., , 2017Marinello et al., 2017;Sivarajan et al., 2018;Colombi & Keller, 2019).
Still, in terms of sampling, there are studies that aim to reduce costs with collection and laboratory analysis of the sample. These studies proposed methods to reduce the number of sampling points to be used in future experiments in the agricultural area, without having a considerable loss in its mapping (Griffith, 2005;Guedes et al., 2014Guedes et al., , 2016Domenech et al., 2017;Maltauro et al., 2019). One of the proposals is the effective sample size, which considers that some sample points may be highly correlated with each other, providing unnecessary cost with collection and laboratory analyzes, since such points add repeated information regarding spatial dependence (Vallejos & Osorio, 2014). The effective sample size represents the estimation of a new sample size considering the effects of the spatial autocorrelation and the purpose of estimating the sample mean of the value of the georeferenced variable as precisely as possible (Griffith, 2005).
The univariate effective sample size estimation developed by Griffith (2005), assumes that the georeferenced attribute has a normal probability distribution. However, there are georeferenced data that do not present a normal probability distribution, especially because such distribution is sensitive to outliers (Fagundes et al., 2018). In this way, Vallejos & Osorio (2014) suggested another more inclusive approach to calculate the estimated value of univariate effective sample size, wich considers the presence of outliers and assumes that the georeferenced variable has Student's t-distribution. The Student's t-distribution allows the class of errors to be extended to other probability distributions to better accommodate the outliers (Assumpção et al., 2014;De Bastiani et al., 2015;Schemmer et al., 2017).
The estimation of effective sample size requires an initial sampling design in the agricultural area and the knowledge of the spatial dependence structure of the georeferenced variable. Generally, when this information is not previously known and the data collection is being initiated, the initial sample size can be determined by the ratio of area to sample size (Wang et al., 2013). For example, the PA recommend considering a maximum of two hectares per sampling point (Molin et al., 2015).
Considering the availability of information obtained previously from the sample design in an experimental area, this study had as main objectives: i) to consider variables that present Student's t-distribution, using the expectation-maximization (EM) algorithm to model the data (Assumpção et al., 2014); ii) to use the spatial dependence structure of SPR to redefine and to reduce the number of sample elements collected in this area by univariate effective sample methodology, considering the existence of the sample points correlated with each other.

Material and methods
We developed two studies: in the first, simulated data was considered, and in the second, we used data on SPR obtained in an agricultural area with soybean cultivation. The simulation study complements the agricultural one because with the simulated data is possible to reproduce a variety of scenarios present in the real data. Therefore, the two studies add practical and theoretical knowledge about sample resizing in soil attributes with spatial dependence structure.

Description of simulations
Consider a stochastic process { ( ), ∈ ⊂ ℝ 2 } , = 1, … , , stationary and isotropic, in which = ( ( 1 ), … , ( )) is a × 1 random vector, where Y( 1 ), … , Y( ) are the observed values of the random variable under study in i sampled spatial locations, with = 1, … , and ∈ ⊂ ℝ 2 . Suppose that has an n-varied Student's t-distribution (De Bastiani et al., 2015), i.e., ., ~ ( , , ν) , where μ is the mean of Y, a constant 3 Sampling redesign of soil penetration resistance in spatial t-Student models value in all i spatial locations is an × 1 is an is an × 1 unit-dimensional vector; ν (ν > 0) is the degree of freedom fixed; and = 1 + 2 ( 3 ) is an × cale matrix, non-singular, where 1 ≥ 0 and 2 ≥ 0 are the nugget effect and partial sill parameters, respectively, I n is the × identity matrix, and ( 3 ) is an × symmetric matrix, where 3 > 0 is a function of the range ( ( 3 ) = ) . The practical range (a) is the spatial dependence radius, the distance at which spatial dependence exists between samples. The parameters 1 , 2 , and 3 , make up the spatial dependence structure of a georeferenced variable (Diggle & Ribeiro Jr, 2007;Soares, 2014). We considered 11 variables (V1, …, V11) with different spatial dependence structures (Fig.  1A). The variables were obtained by simultaneously varying the spatial dependence radius (a) and the intensity of spatial dependence, measured by the relative nugget effect (RNE). As we set the parameter 2 value then the RNE was directly influenced by the variation of the nugget effect ( 1 , 2 , and 3 ). The smallest spatial dependence radius used was 0.3 km, and the largest ranged between 1.0 and 1.2 km. The remaining practical ranges (0.5 and 0.6 km) were considered intermediate based on the maximum distance from the agricultural area (1.8 km). The RNE was considered from moderate (between 25% and 75%) to strong (≤ 25%) (Cambardella et al., 1994).
Given the linear spatial model (Uribe-Opazo et al., 2012), we performed 100 simulations for each of the 11 variables using a Monte Carlo experiment from the Cholesky decomposition of the scale matrix (Cressie, 2015). Each simulation generates a random sample set of these variables, maintaining the characteristics of the spatial dependence structure, and represents different datasets in different agricultural areas or crop years (Mooney, 1997). In these simulations, we fixed the degree of freedom (ν = 5), the mean (μ = 5), the partial sill ( 2 = 1), and the exponential model. As sample planning for the simulations, we used the same configuration (lattice plus close pairs) from the commercial agricultural area under study (Fig.  1C). Other information about the simulations is given in the methodological scheme (Fig. 1B).
Following the scheme presented in Fig. 1B, after apply the EM algorithm to estimate the parameter vector θ for each simulated variable, the value of the effective sample size using the Student's t-distribution was estimated ( , Eq. 1) (Vallejos & Osorio, 2014): (1) where n is the number of simulated sampling points in the original grid (n ≥ 1); v is the degree of freedom (v > 2) 1 is an × 1 unit vector; (̂) = [( (̂) )] is an × estimated spatial correlation matrix of the sample points, where the estimated spatial correlation between the i-th and the j-th sampling point are given by (Eq. 2); r ij are the elements of the ( 3 ) matrix, which calculation depends on the geostatistical model and on the Euclidean distance between observations (De ; and 1 , 2 are the estimated values of the nugget effect and partial sill parameters, respectively. What differs in estimating the univariate effective sample size by considering random vectors with normal probability distribution ( ) in relation to those with Student's t-distribution ( ), is the constant ( + ) . We obtained this constant from the Fisher information matrix for linear spatial models with Student's t-distribution (De . As v > 2 and n ≥ 1, we have v + + 2 > + and is necessarily lower than .

Description of the experimental data
The dataset comes from a commercial area with 167.35 hectares, cultivated with soybean, located in the municipality of Cascavel-Paraná-Brazil, with approximate geographical coordinates of latitude 24.95º South and longitude 53.37º West, and 650 m of average altitude (Fig.  1C). The climate of the region is temperate mesothermic and superhumid, climate type Cfa (Koeppen) (Aparecido et al., 2016), with an average annual temperature of 21ºC. The soil is classified as a Red Dystroferric Latosol with clay texture (EMBRAPA, 2013).
We used a lattice plus close pairs sampling design, with 102 sampling points. This design contained a regular grid (with minimum distance between points equals to 141 m), to which we added 19 sample points (locations). These added locations presented smaller distances with some points of the regular grid (50 m and 75 m). The sample was georeferenced and located with the aid of a signal receiving apparatus with a Geoexplore 3 (Trimble®) Global Positioning System (GPS) set up for the Universal Transverse Mercator (UTM) coordinate system.
In this study, soil resistance to root penetration (in MPa) at depths of 0-10 cm (SPR 0-10 cm), 11-20 cm (SPR 11-20 cm), 21-30 cm (SPR 21-30 cm), and 31-40 cm (SPR 31-40 cm) were used. In terms of improvement in soil management, the study of the spatial dependence of SPR has important agricultural relevance, since this soil attribute is inversely related to root growth and crop yield (Gül-ser et al., 2016). The experimental data of this physical attribute refers to the crop year 2015-2016 and belongs to the database of the Laboratory of Spatial Statistics and the Laboratory of Applied Statistics of the Western Paraná State University (UNIOESTE), Cascavel/Brazil.
The determination of SPR was measured by the penetrograph, as follows: for each sampling point, we performed three readings per centimeter, from 0 to 40 cm, covering the four depths considered (0-10 cm, 11-20 cm, 21-30 cm, and 31-40 cm). The data obtained was transformed in MPa, and the value of the SPR at each depth consisted of the arithmetic mean of the three measurements.
Soil penetration resistance was assumed to have a t-Student probability distribution. From the original sampling design and for each depth, we performed the exploratory and geostatistical analyzes of SPR (Figs. 2A and 2B, respectively). The analyses performed are described in the methodological scheme of Fig. 2, and more information about the methodology is obtained in Cressie (2015).
For each layer of SPR (at depths 0-10 cm, 11-20 cm, 21-30 cm, and 31-40 cm), the value of the effective sample size was estimated ( , Eq. 1) (Fig. 2) by the same methodology applied in the simulated data.
Through the estimated lues in each SPR layer, we redefined a single reduced sample size. The highest estimated value of the was taken (n* = MAX ( ), Fig. 2) from the variables with spatial dependence, i.e., variables in which the value of the spatial dependence radius is not small compared relative to the size of the experimental area and which intensity of spatial dependence (RNE) was at least moderate (Cambardella et al., 1994). We used only georeferenced variables with spatial dependence in the calculation of the since georeferenced attributes without spatial dependence do not present a reduction in the number of sample points (Vallejos & Osorio, 2004).
The highest value criterion was established since a greater number of sampling points is better to capture the spatial variability of variables that have different spatial dependence structures (Pautsch et al., 1998;Diggle & Ribeiro Jr, 2007). Therefore, the tendency is to obtain more representative thematic maps concerning the spatial variability of the attribute in experimental area (Kestring et al., 2015). This is justified by two characteristics: (a) homogeneous variables (with less spatial variability in the area), can be collected with a smaller number of sample units, which would avoid redundant data or oversampling; and (b) variables with rapid change in spatial structure can be collected more intensively, which would avoid undersampling.
To verify the suitability of the reduced sample size, concerning the original sampling design (Fig. 1C), a random design of the original sampling design with sample size n* was selected. For this reduced sample size, the exploratory and geostatistical analysis were also performed Sampling redesign of soil penetration resistance in spatial t-Student models (Figs. 2A and 2B,respectively). Finally, we compared the results obtained between the two sample configurations (original and reduced), using the methodologies presented in Fig. 2 (C and D).
The simulations, and the statistical and geostatistical analysis, were prepared in the software R (R Development Core Team, 2020) using the geoR package (Ribeiro Jr & Diggle, 2001). A computational routine developed in the software R (R Development Core Team, 2020) using the geoR (Ribeiro Jr & Diggle, 2001) and matrixcalc (Novomestky, 2012) packages (and available at goo.gl/JrvtnJ) to estimate the effective sample size ( ).

Simulation studies
The mean and the standard deviation of the estimated values of the were similar for most pairs of variables in which the values of the nugget effect were different and the fixed range was maintained (V1 and V2; V3, V4, and V7; V5 and V6; V9 and V11) (Fig. 3). The estimated values evidenced the existence of three groups of variables (Fig. 3). The first two groups presented, respectively, the highest and intermediate estimated values, being them: the group formed by variables V1 and V2, whose estimated mean value of the was 40 and 44 sample points, in that order. Variables V3, V4, V5, V6, and V7 formed the second group, where the estimated mean value of the ranged from 15 to 31 sample points. These two groups of variables also exhibited high values of standard deviations, with variation of 11 to 14 sample points. The simulated variables V1 and V2 have a small practical range (α = 0.3 km), mainly when compared to the maximum distance between the coordinates of the simulated area (~ 1.8 km). Variables V3, V4, V5, V6, and V7 exhibited spatial dependence radius slightly higher than those of the first group (ranging from 0.5 to 0.6 km), which contributed to the fact that the estimated values were smaller when compared to those obtained in the previous group. The third group, formed by variables V8, V9, V10, and V11, presented the smallest mean values of (ranged from 6 to 8 sample points) (Fig. 3). These four variables have in common, the largest values of the simulated spatial dependence radius (between 1.0 and 1.2 : effective sample size. km). In general, the estimated value ranged from 6 to 44 sample points and provided a reduction between 57% and 95% in the number of sampling points (Fig. 3).

Application of the methodology in soil penetration resistance
The estimated value for SPR at depths 11-20 cm and 21-30 cm was 95 e 101, respectively. The SPR observed at depths 0-10 cm and 31-40 cm had higher reductions in the number of sampling points, with ̂ equal to 51 and 60, respectively, which represents a reduction between 40% and 50%.
Considering the layers of SPR in which spatial dependence was identified (at depths 0-10 cm and 31-40 cm) and the maximum estimated value of the effective sample size observed in these layers, a sample resizing was obtained, reducing the sample size to 60 sample points. Thus, a new sample configuration with 60 points, chosen randomly from the 102 sample points of the original grid, was selected for the study of spatial dependence of SPR.
In the exploratory analysis, the values of the coefficient of variation (CV) showed that the SPR variability is greater in the surface and it is reduced when increasing the sampling depth in the soil (Table 1). The magnitudes of the CVs indicated that there was a medium dispersion of SPR at all depths (Warrick & Nielsen, 1980) (Table 1). Besides, we observed that the reduction in the number of sample points did not influence the SPR variability (Table 1).
The depths 11-20 cm (Fig. 4B) and 31-40 cm (Fig.  4D) showed the greatest amount of outliers (four each), located in the central and western regions of experimental area. The sample points 82 and 34 exhibited outliers, in all depth layers of SPR, except at depth 31-40 cm, in which point 34 was not considered an outlier.
We observed in the geostatistical analysis that, for all depth layers of SPR, the spatial dependence structure can be considered isotropic, i.e., depends only on the distance separating the locations observed, and does not differ with the direction .
The results about the best values for the degree of freedom (v) and the shape parameter (Table 2), showed that for both sample sizes, the model and degree of freedom were the same only for SPR at depth of 31-40 cm. We verified in this depth the lowest values of the standard error (SE) in the Matérn family model with = 0.5, for the degree of freedom v = 10. For SPR at depth of 0-10 cm, the lowest values estimated from the SE were found in the Matérn family model with = 2.5 and shape parameter = 5, or the original sampling design; and with =0.5 and = 10 for the reduced sampling design. At depth 11-20 cm, in both sampling designs, was adjusted the Matérn family model = 2.5 to the semivariance function, but with different degrees of freedom = 5 for the original grid, and = 10 for the reduced grid.
Finally, at depth 21-30 cm of the SPR, although the sampling designs presented the same value for the shape parameter (v = 5), the lowest estimated values of the SE were obtained by the Matérn family model with = 1.5 and 2.5, pectively, for the original and reduced sampling designs ( Table 2). The estimated values of these SEs, at depths 0-10 cm and 11-20 cm of the SPR (Table 2), were smaller in the estimated models considering the reduced sampling design when compared to values obtained in the estimated models using the original sampling design. Besides, for the other depth layers, the estimated value of   Sampling redesign of soil penetration resistance in spatial t-Student models the SE of the range function ( 3 ), was also lower for the estimated models considering the reduced sample configuration ( Table 2). The values obtained by the cross-validation method showed a small increase in errors of the spatial prediction with the reduced sampling design ( Table 2). The errors increased by 7.5%, 6.2%, 9.6%, and 11.5%, respectively, at depths 0-10 cm, 11-20 cm, 21-30 cm, and 31-40 cm of the SPR, comparing with the original sampling design.
For the original sampling design, the spatial dependence structure in the intermediate depth layers of the SPR (11-20 cm and 21-30 cm) presented pure nugget effect, due to the low values of the practical range (180.5 and 110.2 m) and the low spatial dependence (̂ ≥ 5%; Cambardella et al., 1994) (Table 2).
Considering the reduced sampling design, the intensity of spatial dependence was moderate in these intermediate-depth layers (̂ between 25% and 75%; Cambardella et al., 1994). Also, at depth 11-20 cm, there was an increase in the estimated value of the spatial dependence radius to the original sampling design (from 180.5 to 209.1 m). In the other depth layers of the SPR, there was a decrea-se in the estimated practical range (ranging from 7.6 to 31.9 m), compared to that obtained with the original sampling design ( Table 2).
The estimated values of the practical range were relatively low for both sampling designs and in all depth layers of the SPR. That is because the maximum distance in the experimental area is approximately 1,800 m, the ranges ranged from 110.2 to 291.5 m in the original sampling design, and from 78.3 to 273.2 m in the reduced sampling design (Table 2).
We observed visual differences between the maps elaborated considering the two sampling designs, which are most noticeable at depth 11-20 cm (Fig. 5B). According to the classification of Anderson et al. (2001), in most of the depth layers of the SPR, there was a low percentage of hits between the reference map (original sampling design) and the model map (reduced sampling design), because the estimated value of the overall accuracy (OA) was lower than 85%. This indicates that a smaller number of pixels were classified in the same class interval in both maps, evidencing differences between the elaborated maps considering the two sampling designs. The only    Table 2. Estimated values of the parameters that define the spatial dependence structure in each depth layer of soil penetration resistance (SPR, in MPa) from the best values of the shape parameters and considering the original ( n = 102 points) and reduced (n* = 60 points) samplings designs.
: degree of freedom.
: shape parameter of the Matérn family model. Estimated values of: μ : mean, φ 1 : nugget effect, φ 2 : partial sill, 3 : function of the range, â â : pratical range (kilometers), ̂ = 100 1 9 Sampling redesign of soil penetration resistance in spatial t-Student models exception was at depth 21-30 cm (Fig. 5C) in which similarity between the maps made with the original and reduced sampling design was observed (OA > 85%). The Tau concordance index (T), unlike the OA, accounts for not only the proportion of pixels classified in the same class interval in the reference and model maps but also those whose classification was not the same in both maps. The maps made considering the original and reduced sampling designs presented from low to medium accuracy (T < 0.80; Krippendorff, 2004), with the exception at depth 21-30 cm of the SPR (Fig. 5).
Some classes of thematic maps elaborated using the original sampling design presented some null pixels, as can be seen visually at depths of 0-10 cm, 11-20 cm, and 21-30 cm of the SPR (Figs. 5A, 5B, and 5C, respectively). Besides, at depth 21-30 cm, where was obtained high values of the EG and Tau accuracy indexes, a high number of pixels (more than 90% of total) in the same classes was observed (Fig. 5C). Still about at depth 21-30 cm, we observed the formation of circular regions around the sample points (Fig. 5C).
Finally, the estimated values of SPR, using the reduced sampling design, showed the existence of limitations to root growth that varied from low to moderate in almost all the agricultural areas (Canarache, 1991).

Simulation studies
Considering the 100 simulations of each variable, the graph with the means and standard deviations of the estimated values of the univariate effective sample size showed that the variation of the value of the nugget effect did not generate a relevant change in the estimated value of the effective sample size (Fig. 3). The practical range negatively influenced the estimated values, since the greater the practical range of the variable, the lower the estimated value. Although a different sample configuration and size, and even another probabilistic distribution (normal) were considered, the simulation studies of Vallejos &Osorio (2014) andDal Canton et al. (2021) reached similar conclusions regarding the influence of the practical range in reducing the number of sampling points.
The high difference in the estimated values ( Fig.  3) can be explained by the discrepancy between the variables concerning the values of the parameters of spatial dependence, mainly regarding the practical range, which variation was of 0.3 to 1.2 km.
Studies carried out in agricultural areas of a smaller size than the one considered in this paper (< 50 ha), characterized the spatial dependence on soil attributes using a sample size smaller than 50 sample points (Carvalho et al., 2013;Araújo et al., 2014, Tavares et al., 2014, sam-ple size similar to that obtained in the present study for most simulated variables.

Application of the methodology in soil penetration resistance
The values of the CV obtained by Johann et al. (2004) and Bazzi et al. (2013) showed results similar to those of this work, with a moderate classification for CVs in agricultural areas in Western Paraná with soybean planting, and with similar conditions of management, climate, and soil. Besides, the SPR variability is reduced when increasing the sampling depth in the soil, corroborating with that obtained in this study.
The SPR at depths 11-20 cm and 21-30 cm practically did not show a reduction in sample size. This fact is justifiable, mainly due to the practical range influence on the estimated value, verified in the simulation studies of the present study, and verified in Vallejos & Osorio (2014) and Dal Canton et al. (2021) as well. The SPR at depths 11-20 cm and 21-30 cm presented a small estimated value of the spatial dependence radius (110.2 and 180.5 m), relative to the size of the experimental area, and also a low intensity of spatial dependence (̂ > 75%; Cambardella et al., 1994) (Table 2). The higher reductions in the number of sampling points presented at depths 0-10 cm and 31-40 cm are due to the higher estimated values of practical range (291.5 and 222.5 m) ( Table 2).
Griffith (2005) obtained a reduction in sample size (from 36% to 45%) similar to that found in this study, which varied between 40% and 50%, using different sample configurations, attribute probability distribution, and soil chemical attributes. Domenech et al. (2017) considered auxiliary information measurement to map the attribute of interest (soil depth to the petrocalcic horizon), and obtained a reduction in sample size similar to this study also (from 50% to 70%), although their methodologies for optimization and selection of sampling points were different from this study. The mentioned authors obtained sample reductions similar to those of the present study, and the thematic maps obtained by them were considered efficient.
It was found in the literature researches to analyze the spatial variability at different depths of SPR and used between 49 and 60 sample points (Rosalen et al., 2011;Rodrigues et al., 2014;Tavares et al., 2014). Considering the new sample configuration, reduced to 60 sample points, these authors used values of sample size similar to this research, although the magnitude of their mapped experimental areas was lower than that of this study (< 50 ha).
Comparing the original and reduced sampling designs, the estimated values of the relative nugget effect (RNE) and the practical range indicate that with a reduced number of sample points there was an increase in spatial dependence and minor changes in the spatial dependence radius (Table 2). Besides, the estimated values of the SEs of the estimated parameters that define the spatial dependence structure were smaller in the estimated models from the reduced sampling design for the majority of the cases (Table 2). This shows that even with a smaller number of sample points, it was possible to verify the existence of spatial dependence in all depth layers of SPR.
The increase in spatial prediction errors after sampling redesign was already expected, as the number of sample points was reduced by 40%. The literature shows that the greater the number of sample points, the better the result of the interpolation, as shown by the studies by Coelho et al. (2009), Kestring et al. (2015, and Guedes et al. (2016), using different sample densities and metrics to calculate errors. However, the greater the number of observations, the greater the financial cost. Thus, with the magnitude of sample reduction obtained in this study, the increase in spatial prediction errors can be considered small.
The results also indicate that even using a smaller number of sampling points in the study area, efficiently models were estimated to the semivariance function and that they were able to identify the existence of spatial dependence in all depth layers of SPR. This is an important feature of this study since the reduction of the sample size difficult the semivariance calculation (Kestring et al., 2015). However, although it is possible to verify the existence of spatial dependence in all depth layers of SPR, the visual analysis and the accuracy indices (OA and Tau) showed that there are differences between the thematic maps generated with the original and reduced sampling design, indicating that there was the influence of sample size on the spatial dependence characterization of SPR.
About the circular regions around the sampling points, identified at depth 21-30 cm of the SPR map (Fig. 5C), we observed the low estimated value of the practical range (̂ = 78.30 m), near the shortest distance between sample points (~ 50 m), which resulted in the formation of small subregions centered in the sample points, a phenomenon known as 'bull eyes effect' (Menezes et al., 2016), and also observed by Dalposso et al. (2018) andDal Canton et al. (2021).
The results showed that the univariate methodology proved to be advantageous considering the lowest cost in the sampling process due to the 40% reduction in the sample size and the results obtained in the characterization of the spatial dependence in the experimental area. Also, the method proposed in this study obtained a single sample size for all attributes, based on the variables with spatial dependence structure and the maximum estimated value of the among them.