Modelling the number of olive groves in Spanish municipalities

The univariate generalized Waring distribution (UGWD) is presented as a new model to describe the goodness of fit, applicable in the context of agriculture. In this paper, it was used to model the number of olive groves recorded in Spain in the 8,091 municipalities recorded in the 2009 Agricultural Census, according to which the production of oil olives accounted for 94% of total output, while that of table olives represented 6% (with an average of 44.84 and 4.06 holdings per Spanish municipality, respectively). UGWD is suitable for fitting this type of discrete data, with strong left-sided asymmetry. This novel use of UGWD can provide the foundation for future research in agriculture, with the advantage over other discrete distributions that enables the analyst to split the variance. After defining the distribution, we analysed various methods for fitting the parameters associated with it, namely estimation by maximum likelihood, estimation by the method of moments and a variant of the latter, estimation by the method of frequencies and moments. For oil olives, the chi-square goodness of fit test gives p-values of 0.9992, 0.9967 and 0.9977, respectively. However, a poor fit was obtained for the table olive distribution. Finally, the variance was split, following Irwin, into three components related to random factors, external factors and internal differences. For the distribution of the number of olive grove holdings, this splitting showed that random and external factors only account about 0.22% and 0.05%. Therefore, internal differences within municipalities play an important role in determining total variability. Additional key words: table olive; oil olive; agricultural holdings; Waring distribution; estimation. Abbreviations used: L (liability); MF12 (method of one equation of moments and two relations between frequency); MF21 (method of two relations between moments and one equation of frequency); MLE (method of log-likelihood optimisation); MM3 (method of the three relations between moments); P (proneness); R (randomness); UGWD (Univariate Generalized Waring Distribution). Citation: Huete, M. D.; Marmolejo, J. A. (2016). Modelling the number of olive groves in Spanish municipalities. Spanish Journal of Agricultural Research, Volume 14, Issue 1, e0201. http://dx.doi.org/10.5424/sjar/2016141-7687. Received: 12 Mar 2015 Accepted: 20 Jan 2016 Copyright © 2015 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial (by-nc) Spain 3.0 Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: This research is financed by Vice-Rector’s Office for Political Science and Research-University of Granada, through the project “Social-Labour Statistics and Demography” (30.BB.11.1101) at the Faculty of Labour Sciences. Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to María-Dolores Huete-Morales: mdhuete@ugr.es


Introduction
The question of olive production has aroused much interest in Spain and other areas, as olive oil (especially virgin oil) is the cornerstone of the Mediterranean diet; its consumption has been associated with a lower risk of cardiovascular disease (Fernández-Jarne et al., 2001), obesity, metabolic syndrome, type 2 diabetes and hypertension.Moreover, it reduces the risk of cancer (López-Miranda et al., 2010) and ageing by inhibiting oxidative stress (Owen et al., 2000;Gimeno et al., 2002).
Spain is the world's largest producer of olive oil, and the province of Jaén (in the south of the Iberian 2 was later fitted by Irwin (1968) by a univariate generalized Waring distribution, improving on the results obtained by Newbold (1926).Since then, this distribution has been used, independently of the theory of accidents (Irwin, 1963;Xekalaki, 1983c), in fields such as biology (Irwin, 1968), reliability theory (Xekalaki, 1983a), library science (Boxenbaum et al., 1987), computer science (Wolfran, 2003), psychiatry (Canal & Micciolo, 1999), medicine (Kemp, 2001) and linguistics and economics (Kendall & Stuart, 1969).However, it has never been applied in studies related to agriculture, as in the present paper.To the best of our knowledge, no previous application has been made of this distribution in the agricultural context, and therefore this study makes a novel contribution that may be useful for future research.Thus, we present the univariate generalized Waring distribution as a useful tool in this area of study.Let us note that the exploratory data analysis performed justifies the use of the Waring distribution with respect to the variable number of olive holdings registered in Spain.We compared parameter estimation methods and determined which methods allowed us to split the variance of this distribution into three components.This approach opened up a range of possibilities that are not possible with other distributions.

Material and methods
The micro-data used in this study were obtained from the Agricultural Census (INE, 2009).This Census provides detailed information on the crops grown on all Spanish agricultural holdings, broken down by municipalities, thus supplying the information necessary for this study.We obtained maps of shapefiles for Spain, for the corresponding municipalities, using polygons, with the ETRS89 UTM 30N coordinate system (ESRI Map Service, http://www.arcgis.com/).Maps, graphs and distribution fitting were obtained using R free software (https://www.rproject.org/)and the GWRM package (Sáez-Castillo et al., 2010), together with SPSS 20.0 to adapt the data and export them to R. In the following, we define the Waring distribution and describe the fitting methods used.

Waring distribution
A random variable X follows a univariate generalized Waring distribution (UGWD (a, k; ρ)), with parameters a, k and ρ, when it has the following probability mass function: of the 2012/2013 season, Spain had exported 728,621 tonnes of olives/olive oil, Italy 932,000 tonnes and Greece 229,137 tonnes.
Some of the distributions used to model discrete data are well known.The most commonly used is the Poisson distribution, which is simple to use and is widely applied.However, it underestimates the variance due to the phenomenon of dispersion; to overcome this problem, mixture distributions have been proposed, such as the negative binomial distribution, derived from mixing the Poisson and Gamma distributions.Another mixture, which has been applied to address issues in the field of ecology (Katti, 1966) is the Poisson-beta distribution.In the present study, we propose to apply the Waring distribution (a mixture of the negative binomial and beta distribution) to discrete data obtained in the agricultural context.
The generalized Waring distribution (Irwin, 1968(Irwin, , 1975) is a discrete distribution on non-negative integers.This distribution belongs to the Kemp type 4 family of distributions, and has an analogous continuous distribution, which in general is Pearson type 6, although in special cases it may be Pearson type 3, 4 or 5.It is infinitely divisible, self-compensating as defined by Danial (1988) and is a distribution "in limit terms", complete (Xekalaki, 1983b).This distribution can be considered a particular balance distribution (Ferreri, 1984), and efficient algorithms have been created for it by which random variables can be generated for Sibuya's digamma and trigamma distributions (Devroye, 1992), as have applications in stochastic aggregation models (Duerr & Dietz, 2000).Sarabia & Castillo (2003) proposed two multivariate extensions of this distribution: one by means of the Sarmanov-Lee distribution with beta marginal, and the other using the concept of the conditional specification of distributions.Finally, Rodríguez-Avi et al. (2003) studied different parameter estimation methods based on moments and/ or frequencies for Gauss' family of hypergeometric distributions, while they (Rodríguez-Avi et al., 2007) presented an example with which they compared the results obtained applying the maximum likelihood method to the negative binomial distribution, the univariate generalized Waring distribution and the extended Waring distribution, the latter being a tetraparametric univariate distribution generated by Gauss' hypergeometric function.This includes the generalized Waring distribution, as a combination of the negative binomial distribution and the beta type 1 generalized distribution.
The generalized Waring distribution has been applied in many scientific fields.Newbold (1926) reported that the distribution of accidents to workers in a soap factory fitted a negative binomial distribution.This result 3 Modelling the number of olive groves in Spanish municipalities (r + 1) 2 + (a .. and where β r is the r-th probability.Assuming r = 0, we obtain the first relation, which is added to the first two equations between moments (Eq.[2]).This approach enables us to obtain â, k and ρ : Another possibility is the method of one relation between moments and two between probabilities (MF12).However, careful analysis shows that this method does not provide acceptable results and does not provide a good fit; therefore, it should not be used to estimate the parameters of this distribution.

Variance decomposition
The r-th factorial moment of the generalized Waring distribution is given by the following expression: It is immediately obtained that all the moments of order r (central moment about the mean) are infinite if ρ ≤ r, in other words, the mean is finite if ρ > 1 and the variance is finite if ρ > 2. The mean and variance are expressed as follows: Irwin (1968) obtained the following partition of the variance, when the latter is finite (ρ > 2), into three components; the first of these (σ R 2 ) corresponds to random factors, the second (σ λ 2 ), to the variability due to external factors that affect the population (liability) and the third (σ λ 2 ), to the differences in the internal conditions of the individuals (proneness): where r = 0,1, ... , a, k, ρ > 0 and r ∈ ℝ being Γ (.) the gamma function.The probability generator function of X has the following expression:

Parameter estimation: method of moments
With α = a; β = k; γ = a + k + ρ; λ = 1 (a, k ∈ ℝ; ρ > 0) we obtain the probability generator function of the univariate generalized Waring distribution: Let us focus on determining the parameters of this distribution; using the method of moments, via the recurrence relation among moments with respect to the origin (Marmolejo-Martín, 2003), for the discrete distributions in the system, that is: Therefore, we have the following system of three equations and three unknowns: After calculating the first non-centred moments (α 1 , α 2 , α 3 ), â, k and ρ are obtained by resolving [2].

Parameter estimation: two relations between moments and an initial relation between probabilities
An alternative to the method of moments is to consider the relation between probabilities, extracted from the following equation: 4The type of data analysed presents very high frequencies for the first categories of the variable, but these decrease very rapidly to residual levels for the higher classes.Regarding table olives, we recorded an average of 4.06 holdings per municipality, with a coefficient of asymmetry of 12.86 (the third quartile had a value of 1), and so the distribution of this variable can be considered highly asymmetric.The same situation was found for oil olive production, with a mean of 44.84 holdings per municipality and an asymmetry of 7.94, although the value of the third quartile was 26.The frequency distribution is illustrated in Fig. 2. A discrete distribution must be used to fit this type of data in order to reflect the asymmetry that is present.
de Córdoba (2,438), and the majority of these 472 towns are in the provinces of Jaén, Córdoba and Granada, in southern Spain.This type of olive production is more widespread in Spain than that of table olives.The maps of the spatial distribution of the two activities (Fig. 1) show that table olive-related production is located mainly in the southern half of the Iberian Peninsula.

Exploratory analysis
The frequency observed for the discrete variable (table olive and oil olive holdings) is shown in Table 1.We analysed the 8,091 Spanish municipalities listed in the 2009 Agricultural Census on land use (with respect to crops).As shown in the frequency table, most municipalities contain very few or no agricultural holdings for the production of table olives; however, 28 Spanish municipalities contain more than 200 such holdings.The municipalities with the highest numbers of these holdings are Arahal (618), Carmona (520) and Marchena ( 445), all located in the province of Sevilla, in southwest Spain.Regarding the production of oil olives, approximately half of all Spanish municipalities contain at least one holding, while 472 municipalities have more than 200.The largest numbers of such holdings are found in the municipalities of Martos (2,941), Alcalá la Real (2,572) and Priego  5 Modelling the number of olive groves in Spanish municipalities

Waring distribution adjustment
The system of equations [2] and the system [3] were implemented in R. Table 2 shows the results obtained after fitting the UGWD, applying the method of three relations between moments, MM3, two equations of moments and one equation of frequency, MF21, and using log-likelihood optimisation, MLE (a Newton-type algorithm), which was implemented using the GWRM package of R.
The expected values according to the different estimation methods are shown in Table 3 and Fig. 3. Due to the large number of cases, only the first 15 are shown.In cases where ρ > 2, the variance can be split (following Irwin) into three factors: randomness, liability and proneness (Table 4).This is a major advantage of the Waring distribution over other discrete distributions.Table 2. Fitting methods for olive holdings: parameters estimated using the method of three relations between moments (MM3), two equations of moments and one equation of frequency (MF21) and log-likelihood optimisation (MLE) and chi-square goodness of fit test for discrete data: statistic value and p-value (good fit is highlighted in bold).

Discussion
The quality and health benefits of Spanish olive oil are unarguable (Barranco et al., 2008) and this product is an essential element of the Mediterranean diet (Anta et al., 2005).Spain is the world's largest producer of oil olives (Lambarraa et al., 2007) and an extensive land area is dedicated to its cultivation; thus, olive groves form part of the landscape.From the economic standpoint, this industry is of vital importance; in agricultural production, it is second only to intensive horticulture (Sayadi et al., 2012).Hence, the importance of extending the knowledge and understanding of the tools that enable in-depth analysis of this type of agricultural production.
Other types of distribution have previously been applied in agricultural research.In particular, binomial distribution, negative binomial distribution, Poisson distribution and mixture models are well known and have been used in numerous studies, in areas as diverse as counting dung patches (Monton & Baird, 1990), crop quantities and farms (Ridout et al., 1998;Kim et al., 2005;Bravo et al., 2006;Paxton et al., 2011), species (Royle, 2004;Brotons et al., 2005;Kery et al., 2005), and the number of food groups consumed (Hirvonen & Hoddinott, 2004), etc.Although the War- ing distribution model is relatively unknown for studies based on discrete data, it is in fact very suitable for this application.The Waring distribution is valid when the frequency of occurrence is very low, as is the case with the distribution of olive holdings.
Several methods can be used for estimating the UGWD (a, k; ρ) parameters, including maximum likelihood, the method of moments and methods based on the relations between moments and frequencies.The results obtained by the method of moments show that the distribution is virtually biparametric, as the value of k is practically zero in most cases.Rodríguez-Avi et al. (2003) obtained a value of k < 0.067 using the data reported by Beal & Rescia (1953) and by Katti & Gurland (1961).Canal & Micciolo (1999) also obtained values of 0.476 and 0.720 in the fits obtained k for patients' psychiatric records.
The methods we recommend produce similar results, although the method of two relations between moments and one for frequencies (MF21) produces a good fit and good estimates of the parameters, and presents a significant advantage over the maximum likelihood method, namely its speed of calculation; numerical resolution methods are not needed, since the equations to be solved to apply this method do not require them.
Table 4. Breakdown of the variance according to the method used to estimate olive holdings in Spanish municipalities: method of three relations between moments (MM3), two equations of moments and one equation of frequency (MF21) and log-likelihood optimisation (MLE).Randomness (R), liability (L) and proneness (P), only for the methods in which the variance is finite, ρ.This split, as observed above, is a major contribution of the Waring distribution, and one that is not provided by other distributions.We show that the variability arising from external factors, among groups of municipalities, is very high.It is in this case that we might consider including explanatory variables in the model and applying a Waring regression model.Regarding the differences between the two variables analysed, a significant fit is obtained for the oil olive variable; the asymmetry of this variable (7.94), with respect to table olives (12.86), is less marked, and so the following indications are made for readers who may wish to use this type of distribution to fit discrete observations.

Method
Observation of the parameter estimates obtained using the MLE, MM3 and MF21 methods reveals that with MM3 and MF21 the variance is finite and can be split as proposed by Irwin (see the Methodology section).However, this is not the case with the MLE method, as can be seen for the table olive variable, where ρ < 2.Moreover, the estimators obtained by the method based on the first relationship between moments and the first two relations between probabilities (MF12) cannot be considered, because they do not offer a good fit (for this reason, they are excluded from the present analysis and we do not recommend their use for estimating the Waring distribution).Note that the estimators obtained using the method of moments are considerably higher (Table 2), due to the multiplicative nature of their calculation.
Finally, regardless of the type of methodology used to estimate a, k, ρ the parameters, we stress that the major advantage of using the Waring distribution is that it allows us to split the variance, thus revealing the behaviour of the distribution relative to the intrinsic randomness in the observations, the external factors that may influence this behaviour (liability) and, finally, the internal differences between individuals (proneness).As can be seen in our split of the variance for the number of olive holdings (Table 4), random variance takes into account all the effects that cannot be explained, and this value is relatively small in both cases.
The term "Liability" refers to the variability present in each municipality, regarding parameters such as size, geographic location and local climate.Therefore, it cannot be attributed to external factors.Proneness, on the other hand, measures the variability between groups of municipalities, according to the number of holdings in each one.The proportion of liability to proneness is very low, indicating that this variable is well described.Splitting the oil olive data shows that the proportion of variability due to random factors is about 0.22% and that only 0.05% is due to external factors, for all the estimation methods used.The differences between municipalities account for 99.73% of the variability, and this plays an important role in the total variability present in the variable studied.Therefore, a large proportion of the variability corresponds to factors that are unknown and uncontrolled, such as climate and regional economic development.In the same way, splitting the table olive data shows that proneness is about 96%.

Figure 1 .
Figure 1.Distribution of the number of table olive (a) and oil olive (b) holdings in Spanish municipalities.

Figure 2 .
Figure 2. Box plot the distribution of olive holdings in Spanish municipalities: both distributions are highly asymmetric, but especially that for oil olives.

Figure 3 .
Figure 3. Observed and fitted distribution of olive holdings in Spain (for clarity, only the first values), showing the method of three relations between moments (MM3), two equations of moments and one equation of frequency (MF21) and log-likelihood optimisation (MLE).

Table 1 .
Observed frequency of table and oil olive holdings in Spanish municipalities Level (TableOlive

Table 3 .
Observed frequencies for olive holdings in Spanish municipalities (only the first values) and estimated frequencies using the method of three relations between moments (MM3), two equations of moments and one equation of frequency (MF21) and log-likelihood optimisation (MLE).