The univariate generalized Waring distribution (UGWD) is presented as a new model to describe the goodness of fit, applicable in the context of agriculture. In this paper, it was used to model the number of olive groves recorded in Spain in the 8,091 municipalities recorded in the 2009 Agricultural Census, according to which the production of oil olives accounted for 94% of total output, while that of table olives represented 6% (with an average of 44.84 and 4.06 holdings per Spanish municipality, respectively). UGWD is suitable for fitting this type of discrete data, with strong left-sided asymmetry. This novel use of UGWD can provide the foundation for future research in agriculture, with the advantage over other discrete distributions that enables the analyst to split the variance. After defining the distribution, we analysed various methods for fitting the parameters associated with it, namely estimation by maximum likelihood, estimation by the method of moments and a variant of the latter, estimation by the method of frequencies and moments. For oil olives, the chi-square goodness of fit test gives

The question of olive production has aroused much interest in Spain and other areas, as olive oil (especially virgin oil) is the cornerstone of the Mediterranean diet; its consumption has been associated with a lower risk of cardiovascular disease (

Spain is the world’s largest producer of olive oil, and the province of Jaén (in the south of the Iberian Peninsula) contains the country’s highest concentration of olive groves, producing more oil than Italy, the second largest producer country. The most abundant variety in Jaén is Picual, although other varieties, such as Royal, Arbequina and Cornicabra, are also grown. According to the Ministry of Agriculture, Food and Environment (

Some of the distributions used to model discrete data are well known. The most commonly used is the Poisson distribution, which is simple to use and is widely applied. However, it underestimates the variance due to the phenomenon of dispersion; to overcome this problem, mixture distributions have been proposed, such as the negative binomial distribution, derived from mixing the Poisson and Gamma distributions. Another mixture, which has been applied to address issues in the field of ecology (

The generalized Waring distribution (

The generalized Waring distribution has been applied in many scientific fields.

The micro-data used in this study were obtained from the Agricultural Census (

A random variable X follows a univariate generalized Waring distribution (

where

where _{2}_{1} (

With

Let us focus on determining the parameters of this distribution; using the method of moments, via the recurrence relation among moments with respect to the origin (

Therefore, we have the following system of three equations and three unknowns:

After calculating the first non-centred moments (_{1}, _{2}, _{3}),

An alternative to the method of moments is to consider the relation between probabilities, extracted from the following equation:

∀_{r}

Another possibility is the method of one relation between moments and two between probabilities (MF12). However, careful analysis shows that this method does not provide acceptable results and does not provide a good fit; therefore, it should not be used to estimate the parameters of this distribution.

The

It is immediately obtained that all the moments of order

^{2}_{R}) corresponds to random factors, the second (σ^{2}_{λ}), to the variability due to external factors that affect the population (liability) and the third (σ^{2}_{λ}), to the differences in the internal conditions of the individuals (proneness):

The frequency observed for the discrete variable (table olive and oil olive holdings) is shown in

The type of data analysed presents very high frequencies for the first categories of the variable, but these decrease very rapidly to residual levels for the higher classes. Regarding table olives, we recorded an average of 4.06 holdings per municipality, with a coefficient of asymmetry of 12.86 (the third quartile had a value of 1), and so the distribution of this variable can be considered highly asymmetric. The same situation was found for oil olive production, with a mean of 44.84 holdings per municipality and an asymmetry of 7.94, although the value of the third quartile was 26. The frequency distribution is illustrated in

The system of equations

The expected values according to the different estimation methods are shown in

The quality and health benefits of Spanish olive oil are unarguable (

Other types of distribution have previously been applied in agricultural research. In particular, binomial distribution, negative binomial distribution, Poisson distribution and mixture models are well known and have been used in numerous studies, in areas as diverse as counting dung patches (

Several methods can be used for estimating the UGWD

The methods we recommend produce similar results, although the method of two relations between moments and one for frequencies (MF21) produces a good fit and good estimates of the parameters, and presents a significant advantage over the maximum likelihood method, namely its speed of calculation; numerical resolution methods are not needed, since the equations to be solved to apply this method do not require them.

Regarding the differences between the two variables analysed, a significant fit is obtained for the oil olive variable; the asymmetry of this variable (7.94), with respect to table olives (12.86), is less marked, and so the following indications are made for readers who may wish to use this type of distribution to fit discrete observations.

Observation of the parameter estimates obtained using the MLE, MM3 and MF21 methods reveals that with MM3 and MF21 the variance is finite and can be split as proposed by Irwin (see the Methodology section). However, this is not the case with the MLE method, as can be seen for the table olive variable, where

Finally, regardless of the type of methodology used to estimate

The term “Liability” refers to the variability present in each municipality, regarding parameters such as size, geographic location and local climate. Therefore, it cannot be attributed to external factors. Proneness, on the other hand, measures the variability between groups of municipalities, according to the number of holdings in each one. The proportion of liability to proneness is very low, indicating that this variable is well described. Splitting the oil olive data shows that the proportion of variability due to random factors is about 0.22% and that only 0.05% is due to external factors, for all the estimation methods used. The differences between municipalities account for 99.73% of the variability, and this plays an important role in the total variability present in the variable studied. Therefore, a large proportion of the variability corresponds to factors that are unknown and uncontrolled, such as climate and regional economic development. In the same way, splitting the table olive data shows that proneness is about 96%.

This split, as observed above, is a major contribution of the Waring distribution, and one that is not provided by other distributions. We show that the variability arising from external factors, among groups of municipalities, is very high. It is in this case that we might consider including explanatory variables in the model and applying a Waring regression model.