Modelling diameter distributions of Betula alba L . stands in northwest Spain with the two-parameter Weibull function

The diameter distributions of 125 permanent plots installed in birch dominated (Betula alba L.) stands in Galicia were modelled with the two-parameter Weibull distribution. Four different fitting methods were used: that based on percentiles of the distribution, non linear regression, maximum likelihood and the method of moments. The most accurate fit was obtained with the non linear regression (NLR) approach, considering the following statistics in the comparison: bias, mean absolute error (MAE), mean square error (MSE) and number of plots rejected by the Kolmogoroff-Smirnoff (KS) test. The scale parameter (b) and the shape parameter (c) obtained with the most accurate method (non linear regression), were first modelled with simple linear models and then related to commonly measured prediction variables (quadratic mean diameter, dominant height and stand density) with the parameter prediction model (PPM). The parameters fitted with the method of moments were recovered with the parameter recovery model (PRM) from the first and the second moments of the distribution (mean diameter and variance, respectively). Results indicated that both methods were successful in predicting the diameter frequency distributions. The PRM was more accurate than the PPM method.


Introduction
Birch (Betula alba L.) forests are a valuable natural resource in northwest Spain but a lack of studies about the profitability of the wood, as well as a lack of proper management have led to their general abandonment.At present birch stands cover 49,000 ha in northwest Spain (32,000 ha in Galicia and 17,000 ha in the adjoining region of Asturias) as dominant tree species, derived mainly from natural regeneration, frequently on abandoned land, and occasionally in plantations.Birch is the 5 th species in terms of number of trees in Galicia, behind Eucalyptus globulus, Pinus pinaster, Quercus robur and Quercus pyrenaica.The total volume of Galician birch stands is nearly 3 million m 3 , which represents 2.3 % of the total wood produced in the region (Xunta de Galicia, 2001).These stands are of great ecological and protector value because they are usually located in hilly areas in the study region, under cold climates and adverse soil conditions.
Forest managers must respond to current demands and consider forests as multi-purpose sites (in terms of carbon storage, landscape value, as recreational sites, etc.); for this they need tools for evaluating different management practices and their effects on stand structure.Development of a growth models for the species (Rojo et al., 2005;Diéguez-Aranda et al., 2006) enables promotion of the productive and protective aspects of birch stands in northwest Spain.Diameter class models allow prediction of stand growth and planning of various uses and provide data about stand structure.These models are used to estimate stand variables and their structure with a density or distribution function, which is fitted to diameter distributions at breast height or individual tree volume.
There is, for example, great interest in knowing the number of trees in a diameter class in a stand in which a particular silvicultural treatment has been applied, because the diameter sizes determine the industrial use of the wood and thus the price of the different products.Diameter distributions also give information about stand structure, age structure, stand stability, etc. and enables planning of silvicultural treatments.Furthermore, tree diameter is an important factor in harvesting because it determines the type of machines used and how they perform during felling and transportation of the wood.
The Weibull probability density function was first used for modelling diameter distributions of pure and even-aged stands (Bailey and Dell, 1973), and since then it has been used in many growth models based on diameter distributions because of its flexibility and simplicity (Rennolls et al., 1985;Maltamo et al., 1995;Kangas and Maltamo, 2000;Zhang et al., 2003;Liu et al., 2004).
The two-parameter Weibull function has been reported to be the most simple and accurate for modelling diameter distributions in birch stands (Betula alba L.) in northwestern Spain (Gorgoso, 2003), in accordance with Maltamo et al. (1995) for Pinus sylvestris and Picea abies stands in Finland, Álvarez-González (1997) for Pinus pinaster stands, and Condés (1997) for different species in Spain.These authors discussed the advantage of the use of the two-parameter Weibull probability density function over the three parameter function.
In this models two types of methods have been developed for modelling the parameters with stand variables (Hyink and Moser, 1983): Parameter prediction models (PPM) and parameter recovery models (PRM).Parameter prediction models (Smalley and Bailey, 1974;Álvarez-González, 1997;Torres-Rojo et al., 2000) relate the parameters of a distribution function and stand variables by use of simple linear models.Parameter recovery models (PRM) were developed to improve the estimates obtained with the PPM models, and relate stand variables to percentiles (Bailey et al., 1982;Cao and Burkhart, 1984) or moments (Newby, 1980;Burk and Newberry, 1984;Lindsay et al., 1996;Río, 1999) of the diameter distribution.
The main aim of the study was to compare the accuracy of the two types of methodologies (PPM and PRM) with the two-parameter Weibull function and to compare four methods of fit: non linear regression, maximum likelihood, moments and percentiles, by applying them to data corresponding to 125 diameter distributions of Betula alba stands in northwest Spain.

Data set and processing
The data used to develop the diameter distribution model were obtained from a total of 125 research plots in Betula alba L. stands throughout Galicia (northwest Spain), which were measured in 1998 and 1999.Birch is considered as a shade intolerant species in this area, so the stands are mainly even-aged.The size of the plots ranged from 200 m 2 to 1,000 m 2 , depending on the stand density, in order to achieve a minimum of 30 trees per plot.The plots were established in birch-dominant stands (more than 85% of birch standing basal area), and were subjectively established to cover a wide variety of combinations of age, density and site.
All trees in each plot were numbered.Two perpendicular diameters at breast height were measured with callipers to the nearest 0.1 cm, and the arithmetic average was calculated.The minimum diameter inventoried was 5 cm.A total of 10,386 diameter measurements were available for analysis in the study.Furthermore, the heights of a random sample of 30 trees in each plot were measured to the nearest 0.25 m with a Blume-Leiss, for calculating the mean height.Dominant height was calculated from the percentage of the 100 thickest trees per ha (Assmann, 1970), and age was estimated by counting growth rings in samples extracted from some stems.
The following stand variables were calculated from the inventory data: age, density, quadratic mean diameter, basal area and mean and dominant height.Summary statistics including mean, maximum and minimum values and standard deviation of the main stand variables are shown in Table 1.
Other distribution variables of diameters were also calculated: mean diameter and the following percentiles: 16.731%, 25%, 50%, 75%, 63% and 97.366%, which were calculated by five different methods with SAS/STAT TM (SAS Institute Inc., 2001): Method 3: Method 5: where n is the number of data points, t is the estimated percentile per one, j is the entire part of the product of n • t, i is the entire part of the product of n • t plus 0.5, g is the decimal part of the product n • t, x(j) the diameter sitting in the j position when they are ordered from minimum to maximum values, x(j + 1) is the diameter in the position j + 1 with the same order, k is the entire part of the product (n + 1) • t and h is the decimal part of the product (n + 1) • t.

Fitting methods
The two-parameter Weibull distribution (with the situation parameter a = 0) is obtained by integrating the Weibull density function, and has the following expression for a random variable x: [6] where F(x) is the cumulative relative frequency of trees with diameter equal to or smaller than random variable, b is the scale parameter and c is the shape parameter.
Four methods of estimating the two-parameter Weibull distribution were compared: that based on percentiles of the distribution, non linear regression (NLR), maximum likelihood and moments.

Method of percentiles
The method of percentiles for the estimation of the two parameter Weibull function was computed by Dubey (1967), Shiver (1988), Newberry et al. (1993) and Condés (1997).The values of the parameters were obtained with the following expressions: [7] [8] Shiver (1988) proposed values of r = 0.97366 and t = 0.16731, which were used in the present study.

Non Linear Regression (NLR)
The Weibull distribution is a non linear model, with the following general expression (Seber and Wild, 1989) for a dependent variable (y) and the independent variable (x): [9] where m is a non linear function that depends on the parameter vector (α -), which must be estimated, and ε i the errors with the same criteria as the linear model (normal distribution, homocedasticity and independence).Estimation of the parameter vector (α -) was carried out by minimum squares with the PROC NLIN procedure in SAS/STAT TM (SAS Institute Inc., 2001).The scale and shape parameters of the Weibull distribution were obtained with the non linear regression method by Álvarez-González (1997) and García-Güemes et al. (2002).The procedure was initiated with the values of the parameters obtained by the percentile method and the convergence of the parameters was obtained with the Gauss-Newton algorithm (Hartley, 1961).

Maximum Likelihood
The maximum likelihood estimation method used by Condés (1997), Nanos and Montero (2002) or Eerikäinen and Maltamo (2003) allows calculation of the two distribution parameters with the following equations: [10] [11] where n equals the number of sample observations in a Weibull distribution and x i (cm) the diameter of each tree.To calculate the two parameters (b and c) the LIFEREG procedure in SAS/STAT TM (SAS Institute Inc., 2001) was used.

Method of Moments
The method of moments used by Shifley and Lentz (1985), Nanang (1998) and Río (1999) is based on the relationship between the two parameters of the Weibull function and the f irst and second moments of the diameter distribution (mean diameter and variance, respectively): [12] [13] where d is the mean diameter of the distribution, σ 2 the variance and Γ(i) is the Gamma function for each point (x = i).Equation [13] was resolved by an iterative procedure with the bisection method (Gerald and Wheatley, 1989).

Method comparison
The fitting method consistency was evaluated by the bias, mean absolute error (MAE), and mean square error (MSE), with the following expressions: where Y i is the observed value, Y ˆi is the theoretical value predicted by the model and N is the number of data points.Mean square error (MSE), mean absolute error (MAE) and bias were calculated for each fit in mean relative frequency of trees per one for all diameter classes and plots.
Each f it was also tested with the Kolmogoroff-Smirnoff (KS) test.This test compares the cumulative estimated frequency with the observed frequency.The most striking difference between the two distributions was the D value of the KS test: where S(x) is the cumulative frequency distribution observed for the sample x i (i = 1, 2, ..., n) and F(x) the probability of the theoretical cumulative frequency distribution.The significance level was established at 20% and diameter classes of 1 cm intervals were chosen.

Modelling distribution parameters with stand variables
Correlation analysis was carried out by estimating Pearson's correlation coefficients between parameters corresponding to each plot and fitted by non linear regression and different stand variables or diameter distribution variables that can be estimated from stand variables with simple regression models.The variables chosen were: mean diameter of the distribution (d -), quadratic mean diameter of the stand (d g ), median (P 50 ), minimum diameter (D MIN ), maximum diameter (D MAX ), 25% percentile (P 25 ) and 75% percentile (P 75 ) of the distribution, density (N), basal area (G), dominant height (H 0 ) and median height (H m ) of the stand, and the natural logarithm of the cocient of the minimum diameter, the 25% percentile, the mean diameter and the median by the quadratic mean diameter (LD MIN, LP 25 , L d and LP 50 , respectively), in accordance with Álvarez-González (1997).
Pairs of parameters and variables were related with simple linear models by use of parameter prediction models (PPM).These equations were fitted simultaneously in order to correct errors because some variables are independent in some models and then become dependent, because they can not be obtained directly from yield tables.The simultaneous fitting was carried out with the FIML (full information maximum likelihood) method, by the PROC MODEL procedure in SAS/ETS (SAS Institute Inc., 2001).
On the other hand, parameters fitted by the moment's method were recovered with parameter recovery models (PRM), because the goodness of fit was better than that obtained with the percentiles method.With this method the mean diameter and the variance must be known (first and second order moments of the distribution, respectively).The variance was estimated with the following expression: [17] The mean diameter (d -) was related to the quadratic mean diameter (d g ), disaggregated from yield tables for birch stands in northwest Spain (Rojo et al., 2005), and to stand variables that can be obtained from yield tables, with models in which mean diameter estimated is always smaller than quadratic mean diameter (Frazier, 1981): [18] where X is the vector of independent stand variables in a fixed instant and β is the vector of parameters to be estimated.For selecting the model, the following statistics obtained from residues were tested: bias [Eq.14], mean square error (MSE) obtained in this case with the equation [19] and adjusted determination coefficient (R 2 adj ): [19] [20] where Y i , Y ˆi and Y are, respectively, the observed, predicted and mean values of the dependent variable; N is the number of the data points in the fit and p is the number of parameters to be estimated.

Fitting the two-parameter Weibull distribution
The mean values of bias, mean absolute error (MAE) and mean square error (MSE) obtained in the fits by the percentile approach, and the number of plots rejected by the KS test are shown in Table 2.The mean values of bias, MAE and MSE were similar for all methods of calculating percentiles.The number of plots rejected by the KS test was similar for the most accurate methods (approximately 34%).
The mean values of bias, MAE and MSE, and the number of events rejected by the KS test in the fits with non linear regression, maximum likelihood and moments are shown in Table 3.The most accurate fits, in terms of bias, MSE and number of rejections by the KS test (only 1 plot), were obtained with the non linear regression (NLR) approach and the least accurate fits were obtained with the maximum likelihood approach.
The results obtained with the method of moments were similar to those obtained with the percentiles method, and only better than NLR in terms of MAE.The number of events rejected by the KS test (42 plots), was 34% of the total, similar to percentage rejected by the percentiles and maximum likelihood approaches.
The mean values of bias and MSE for each diameter class obtained with the four methods of f itting are shown in Figure 1 and 2, respectively (in the fit with percentiles calculation method 2 was used because it was the most accurate).High values of bias were observed in the smallest diameter classes with the maximum likelihood, moments and percentile approaches, and then the values decreased suddenly.The method that provided greatest stabilization of the bias throughout the diameter range was non linear regression (Fig. 1).
The maximum likelihood method provided high values of MSE up to a diameter of 26.5 cm, and then stabilized (Fig. 2).The non linear regression approach provided the most consistent results in terms of MSE, with values often less than 0.001 for each diameter class, always tending to decrease with increasing size of diameter classes; for the largest diameter class the results were similar to those obtained by the maximum likelihood approach, but lower than those based on percentiles and moments of the distribution.Nevertheless, in this diameter range the number of data points in the fits was less, with only 2.34% of trees with diameter larger than 25 cm.Furthermore, these methods were less accurate for the smallest diameter classes, and thus   the mean value of MSE for the total number of data was larger.

Modelling function parameters
The parameters fitted with the most accurate method (NLR) were predicted with the Parameter Prediction Model (PPM).Pearson's linear correlations for parameters b and c and stand variables and also diameter distribution variables are shown in Table 4.
The results show a good linear relationship between the scale parameter (b) and the measurement of central tendency.The value of the adjusted determination coefficient obtained was 0.99, in a model that relates b to the quadratic mean diameter (d g ).b = -0.705+ 1.063 • d g (cm) R 2 = 0.99 [21] For the shape parameter (c) the highest value of Pearson's linear correlation (0.794) corresponded to the natural logarithm of the division between the 25% percentile and the quadratic mean diameter (LP 25 ).
Nevertheless the 25% percentile (P 25 ) cannot be obtained directly from yield tables and it was thus estimated by simultaneously fitting equation ( 22) with another that allows this percentile to be obtained from the density of the stand (N) and the dominant height (H 0 ), with the FIML method (full information maximum likelihood).
On the other hand, the parameter recovery model (PRM) was applied with the moments method of fitting.The results of the modelling of the mean diameter with the quadratic mean diameter (d g ) disaggregated from the yield tables for Betula alba in northwest Spain and with other stand variables are shown in Table 5.The values of the statistics used in the comparison the final results obtained with PPR fitted by non linear regression and PRM f itted by the method of moments after modelling the parameters are shown in Table 6.The mean values of bias and MSE in each diameter class obtained with PPR and PRM are shown in Figure 3.

Discussion
The large number of plots rejected by the KS test for the percentiles, maximum likelihood and moments methods implies that these methods are not appropriate for fitting diameter distributions of birch in northwest Spain.The results are consistent with those of García-Güemes et al. (2002), who obtained more accurate fits with the non linear regression method than with the method of percentiles for Pinus pinea stands in Valladolid (Spain).
Maximum likelihood, moments and percentile approaches underestimated the frequencies of trees in the smallest diameters class, although for the largest diameter class, the percentile approach was the most accurate.The maximum likelihood approach predicted larger values than the observed distribution for diameter classes of 30.5 cm and above, and never stabilized to values of bias close to zero, which is consistent with the poor results obtained by Condés (1997).Nevertheless, this method of fitting was applied successfully with the two-parameter Weibull model of Nanos and Montero (2002) in Pinus pinaster stands in Spain, and by Eerikäinen and Maltamo (2003) in Pinus kesiya stands in Zambia and Zimbabwe.Zhang et al. (2003) obtained better results with this method than with the moments and percentiles approaches in conjunction with the three-parameter function for mixed sprucefir stands in northeastern North America.Palahí et al. (2006aPalahí et al. ( , 2006b) ) fitted the left-truncated Weibull for different species in Catalonia for a truncation diameter of 7.5 cm.Parameters b and c were estimated by the maximum likelihood approach and the frequency of trees was predicted in trees/ha.High values of the root mean squared error were obtained for the smallest diameter classes.
In the case of the NLR approach, frequency distributions were often overestimated, with values of bias less than -0.02.The accuracy was lower than with the percentiles and method of moments for the largest diameter classes, but the shortage of data for these diameter classes implies less reliable results.

A B
The method of percentiles is very accurate for the largest diameter classes, which are of higher economic value.Nevertheless, because of poor results for the smallest diameters and the large number of plots rejected by the KS test, this method is not recommended.The moments approach provided similar results, which were even better for some diameter classes.This is consistent with the findings of Nanang (1998), who obtained more accurate results with the method of moments than with the percentiles approach for Azadirachta indica plantations in Ghana.Bailey and Dell (1973) concluded that method of moments is more accurate than the percentile method, but pointed out that the calculations are more complicated.
Considering the changes in bias and MSE per diameter class (Figs. 1 and 2), the mean values of bias, MAE and MSE, and the number of rejections by the KS test (Tables 2 and 3), the non linear regression approach may be the most suitable for fitting the diameter distributions of birch stands in northwest Spain, with the two-parameter Weibull distribution.
As regards the parameter modelling, the parameter prediction model (PPM) avoids some of the inconveniences of the parameter recovery model (PRM) (Borders and Patterson, 1990;Nepal and Somers, 1992;Vanclay, 1995).Furthermore, the use of the two-parameter model avoids having to predict the situation parameter a (a = 0 in this case), which is always complex and not very accurate (Ortega, 1989;Maltamo et al., 1995).
The Pearson's correlation analysis revealed close linear correlation between the scale parameter b and the quadratic mean diameter (d g ), in accordance with Álvarez-González (1997) and García-Guëmes et al. (2002) who obtained linear models for this relationship, with values of adjusted determination coefficient (R 2 adj ) close to 99%, higher than that obtained by Rennolls et al. (1985) for sitchensis stands.
For modelling the shape parameter c by use of two simultaneous fits of linear models, the relationships between errors in the estimation and the endogenous error in the dependent variables have been considered (Torres-Rojo et al., 2000).The independent variables in the models are quadratic mean diameter (d g ), density (N) and dominant height (H 0 ), which are all easily obtained from yield tables.In both models the values of adjusted determination coefficient (R 2 adj ) were high (0.89 and 0.63, respectively) considering that the prediction of this parameter is often not very accurate.Rennolls et al. (1985) and Maltamo et al. (1995) for example, only explained a small part of the total variability in the parameter (5% and a 31% respectively), in the first case with a parabolic model including the mean diameter, and in the second case, with a linear model including the quadratic mean diameter and the natural logarithm of the age.Palahí et al. (2006a) developed parameter prediction models for Pinus sylvestris, Pinus nigra and Pinus halepensis in Catalonia using the truncated Weibull function.These authors used regression analysis to establish the relationship between Weibull parameters and stand basal area, number of trees per hectare and elevation of the site.
When the parameters were recovered, the most accurate model for modelling mean diameter was that predicting the mean diameter without the stand density (N), which was not significant (at a level of 5%).Thus mean diameter was modelled with quadratic mean diameter (d g ), dominant height (H 0 ) and age (t).In this case the value of the adjusted determination coefficient (R 2 adj ) was 0.94.The final results showed that PRM provided more accurate estimates than PPM in terms of MSE and MAE (see Table 6), although the KS test rejected more of the events.Thus, the parameter modelling with PRM was more accurate because similar final values were obtained in terms of the number of events rejected by the KS test for both methods in relation to the large differences in the fits (one plot rejected by NLR approach and 42 plots with the method of moments).Recently Liu et al. (2004) found that PRM performed better than PPM in terms of predicting error for plantations of black spruce (Picea mariana) in central Canada.
In conclusion, the NLR approach was found to be more accurate than method of moments, maximum likelihood and percentiles for fitting the two-parameter Weibull distribution to diameter distributions of birch in northwest Spain.Parameter prediction models and parameter recovery models were successful methods of modelling the function parameters, with the PRM being more eff icient for estimating the theoretical distribution.
Folgueira, Joaquín Pedre and Darío Ferreiro for help in analysing the data set.

5 Figure 1 .
Figure 1.Mean value of bias in number of trees per one in each diameter class obtained by four fitting methods of the two-parameter Weibull distribution.

Figure 2 .
Figure 2. Mean value of mean square error (MSE) in number of trees per one in each diameter class obtained by four fitting methods of the two-parameter Weibull distribution.

Figure 3
Figure 3 (A and B).Mean value of bias and MSE in number of trees per one in each diameter class with the PPM (Parameter Prediction Model) and PRM (Parameter Recovery Model).

Table 1 .
Summary of main stand variables for the sample used for modelling(N = 125)

Table 2 .
Mean values of bias, mean absolute error, mean square error in number of trees per one and number of plots rejected by the KS test (α = 0.20) for five different methods of calculation of percentiles MAE: mean absolute error.MSE: mean square error.KS: number of rejections by the Kolmogoroff-Smirnoff test.

Table 3 .
Mean values of bias, mean absolute error in number of trees per one and number of plots rejected by the KS test (α = 0.20) for three fitting methods for two-parameter Weibull distribution

Table 5 .
Parameter estimations and statistics used for modelling the mean diameter (d -) of the stand

Table 6 .
Mean values of bias, mean absolute error, mean square error in number of trees per one and number of plots rejected by the KS test (α = 0.20) for the two modelling methods: Parameter Prediction Model and Parameter Recovery Model