Comparing Johnson ’ s SBB , Weibull and Logit-Logistic bivariate distributions for modeling tree diameters and heights using copulas

Aim of study: In this study we compare the accuracy of three bivariate distributions: Johnson’s SBB, Weibull-2P and LL-2P functions for characterizing the joint distribution of tree diameters and heights. Area of study: North-West of Spain. Material and methods: Diameter and height measurements of 128 plots of pure and even-aged Tasmanian blue gum (Eucalyptus globulus Labill.) stands located in the North-west of Spain were considered in the present study. The SBB bivariate distribution was obtained from SB marginal distributions using a Normal Copula based on a four-parameter logistic transformation. The Plackett Copula was used to obtain the bivariate models from the Weibull and Logit-logistic univariate marginal distributions. The negative logarithm of the maximum likelihood function was used to compare the results and the Wilcoxon signed-rank test was used to compare the related samples of these logarithms calculated for each sample plot and each distribution. Main results: The best results were obtained by using the Plackett copula and the best marginal distribution was the Logit-logistic. Research highlights: The copulas used in this study have shown a good performance for modeling the joint distribution of tree diameters and heights. They could be easily extended for modelling multivariate distributions involving other tree variables, such as tree volume or biomass.


Introduction
Stand volume, one of the most important variables in forest management, is usually estimated based on sampled tree diameters and heights (Wang & Rennolls, 2007).The common practice is to obtain the height data from a subsample of trees for which diameters are available, and to fit an empirical height-diameter relationship to estimate the average height per diameter class.Tree volume is then estimated using an individual-tree volume equation.Although this approach may appear satisfactory, it is often not appropriate because one tends to ignore the fact that height may vary considerably for a given diameter due to genetic, environmental or silvi-2 price sizes (Schreuder & Hafley, 1977) and stand structural diversity (Staudhammer & LeMay, 2001).
Hence there has been considerable interest in identifying suitable bivariate distributions to describe diameterheight frequency data.For many years, the bivariate extension of the S B distribution, the S BB (Johnson, 1949), has been the only bivariate distribution used for modeling bivariate tree diameter-height frequency data (e.g.Hafley & Schreuder, 1977;Knoebel & Burkhart, 1991;Tewari & Gadow, 1999;Castedo Dorado et al., 2001;Zucchini et al., 2001).Johnson's S BB is developed by applying a fourparameter logistic transformation to each of the component variables of a standard bivariate normal distribution (Johnson, 1949;Rennolls & Wang, 2005).The construction of any other analytic bivariate distribution without resorting to a transformation of a bivariate normal distribution is complicated (Wang & Rennolls, 2007).However, the use of a copula function has provided a general way of constructing multivariate distributions.During recent years, several authors made use of the approach described by Sklar (1973) joining a multivariate distribution based on their one dimensional marginal distributions (e.g.Li et al., 2002;Wang & Rennolls, 2007;Wang et al., 2008).
The objective of the present study is to fit and compare the accuracy of three bivariate distributions: Johnson's S BB , Weibull and Logit-Logistic fitted to diameter-height data from pure and even-aged stands of Eucalyptus globulus in Northwestern Spain.The Weibull and the Logit-Logistic (LL) bivariate distributions, denoted as Weibull-2 P and LL-2 P , were obtained from Weibull and LL marginal distributions by using the Plackett copula whereas the S BB bivariate distribution was obtained from S B marginal distributions using the Normal copula, i.e., a four-parameter logistic transformation to each of the component variables.

Data
All tree diameters and heights were measured in 128 field plots in Tasmanian blue gum (Eucalyptus globulus Labill.)stands in Galicia.The plots had been re-measured 1, 2 or 3 times resulting in a total of 308 inventories.The plots were established in pure and even-aged stands covering a wide variety of combinations of age, number of trees per hectare, site quality and method of regeneration.The sample plot size ranged from 375 to 900 m 2 , depending on stand density.The objective was to assess a minimum of 30 trees per plot.
All trees in each plot were numbered; diameters at breast height were measured with a caliper, to the nearest 0.1 cm, and heights were measured with hypsometer to the nearest 0.1 m.The stand variables calculated in each inventory included the quadratic mean diameter, the number of trees per hectare, dominant height, basal area and mean height.A total of 17,588 trees were measured.The summary statistics of the main stand variables are presented in Table 1.Wang et al. (2008) presented an exhaustive review of different one-parameter copulas which are useful for modeling bivariate tree diameter and height distributions.A copula is a function that joins a multivariate distribution function based on its one-dimensional marginal distributions.Suppose X and Y are two continuous random variables and  Fitting bivariate distributions to height-diameter data using copulas and the range of heights plus one, for the two marginal distributions, respectively.The Weibull-2 P and the LL-2 P bivariate distributions were obtained using the Plackett copula and the marginal Weibull and Logit-Logistic density functions:

Copula functions
Logit-logistic density function where x is the diameter (D), y is the height (H), ε is the location parameter, b and c are the scale and shape parameters of the Weibull distribution, with b, c > 0; λ is the scale parameter and μ and σ are the shape parameters of the Logit-logistic distribution, with ε < The parameters were estimated by minimizing the negative log-likelihood function of equations (1) for Weibull-2 P and LL-2 P and (4) for S BB using the R function optim (R Core Team, 2014).Assuming that the sample observations are independent with identical distributions, the negative log-likelihood function is the sum of single-tree terms (Wang & Rennolls, 2007).Both univariate distributions considered in this study to develop bivariate distributions using the Plackett copula (Weibull-2 P and LL-2 P ) have a closed form of their cumulative distributions.If this is not the case, numerical methods should be used for evaluating the cumulative distribution in the model-fitting process.

Comparing the bivariate distributions and goodness-of-fit
Each bivariate model considered in this study has the same number of parameters, namely five: two specific parameters for each marginal distribution and one common parameter.Thus, the parameter values were used as goodness-of-fit criteria for comparison.The Wilcoxon signed-rank test was used to compare the related samples of the negative log-likelihood function calculated for each sample plot and each of the three distributions.This is a non-parametric paired difference test to assess whether the population mean ranks differ when the population cannot be assumed to be normally distributed.bution functions, respectively.The copula function C combines these two marginal to give the joint distribution function H (x, y) as H (x, y) = C F(x),G( y) ( ) .If both marginal distribution functions and the copula are differentiable, the joint density function can be expressed as: where f(x) and g(y) are the marginal density functions, and c(F(x),G( y)) is the used copula density.
Frequently used copulas are the Normal (Mardia, 1970) and the Plackett copula (Plackett, 1965).Their densities are given by (Wang et al., 2008): where z x and z y are specific transformations of x and y, respectively and ω, defined as the cross-product ratio or odds-ratio, is a positive constant for all (x,y) for which neither F nor G assumes the value 0 or 1; ρ is a measure of the degree of association.

Fitting the SBB, Weibull-2 P and Logit-logistic (LL-2 P ) bivariate distributions
The S BB distribution was obtained from S B marginal distributions using the normal copula.In this case, the variables x and y were defined as: where ε 1 and ε 2 are the location param- eters and λ 1 and λ 2 are the observed ranges of diameter (D) and height (H), respectively.The values of z x and z y were obtained from a four-parameter logistic transformation of x and y z ) .These variables have a joint normal bivariate distribution with correlation ρ: The parameters ε were predetermined as d min -0.5 and h min -0.5 for diameter and height, respectively, whereas the parameter λ was set equal to the range of diameters 4 gistic marginal distribution combined with the Plackett copula.Wang et al. (2008), in a study for Chinese fir plantations, comparing five different copulas including the Normal and the Plackett, found that the normal copula showed the best results.In this study, we cannot compare directly the copulas because we are using different marginal distributions with each copula.Moreover, as the authors pointed out, the age range of the Chinese fir plantations used in their study was very limited and older stands had different structures influencing the observed outcomes.
The S BB distribution showed better results in terms of goodness-of-fit statistics than the Weibull distribution (Table 3), although the differences were not significant.The better performance of LL-2 P over S BB and Weibull was expected since the logit-logistic univariate distribution is more flexible than the other two, covering a wide range of skewness-kurtosis combinations.However, the good results of the Weibull distribution were unexpected,

Results and discussion
The means, maxima, minima and standard deviations of the estimated parameters for the three bivariate distributions (bivariate Johnson's S BB , Weibull-2 P and LL-2 P ) are presented in table 2. The maximum likelihood estimation converged for all plots and for all three bivariate distributions.In a study in Chinese fir plantations (Cunninghamia lanceolata Lamb.) a number of sample plots did not converge for the LL-2 P bivariate and bivariate beta distributions, probably due to these plots having J-shaped marginal distributions (Wang & Rennolls, 2007).Our good results could be due to the fact that all sample plots were installed in even-aged forests.
Table 3 presents the between-model comparative performance of the three bivariate distributions in terms of their goodness-of-fit statistics and the Wilcoxon rank test.The best results were obtained with the Logit-lo- because the Weibull univariate turned out to be the least flexible of the three univariate distributions used.The reason for this may be the very regular shape of the marginal diameter and height distributions of our even-aged stands.Moreover, it also should be taken into account that the locations (ε) and the ranges (λ) of diameters (D) and heights (H) were fixed, affecting especially the performance of LL-2 P and S BB bivariate distributions.
Both the normal and the Plackett copulas have shown a good performance for modeling the joint distribution of tree diameters and heights.They could be easily extended for modelling multivariate distributions involving other tree variables.However, it should be noted that the normal copula, in general, does not have a closed form for its joint density, except for the Normal or Johnson's marginal distributions.Another point to consider is the fact that the Plackett copula requires that the marginal has a closed form for its cumulative distribution (F(x) and G( y) in equation ( 3)), to avoid numerical methods for evaluating the cumulative distribution in the model-fitting process. 3

Table 1 .
Summary of the main descriptive statistics of stand variables.

Table 2 .
Mean values, maximum, minimum and standard deviation of the parameters for the three bivariate distributions compared.