Short communication
Comparing Johnson’s SBB, Weibull and Logit-Logistic bivariate distributions for modeling tree diameters and heights using copulas
Departamento de Biología de Organismos y Sistemas. Universidad de Oviedo. Escuela Politécnica de Mieres. Mieres, Spain
Unidad de Gestión Forestal Sostenible (UXFS). Departamento de Ingeniería Agroforestal. Escuela Politécnica Superior. Universidad de Santiago de Compostela. Lugo, Spain
Unidad de Gestión Forestal Sostenible (UXFS). Departamento de Ingeniería Agroforestal. Escuela Politécnica Superior. Universidad de Santiago de Compostela. Lugo, Spain
Unidad de Gestión Forestal Sostenible (UXFS). Departamento de Ingeniería Agroforestal. Escuela Politécnica Superior. Universidad de Santiago de Compostela. Lugo, Spain
Unidad de Gestión Forestal Sostenible (UXFS). Departamento de Ingeniería Agroforestal. Escuela Politécnica Superior. Universidad de Santiago de Compostela. Lugo, Spain
Abstract Aim of study: In this study we compare the accuracy of three bivariate distributions: Johnson’s S_{BB}, Weibull-2^{P} and LL-2^{P} functions for characterizing the joint distribution of tree diameters and heights. Area of study: North-West of Spain. Material and methods: Diameter and height measurements of 128 plots of pure and even-aged Tasmanian blue gum (Eucalyptus globulus Labill.) stands located in the North-west of Spain were considered in the present study. The S_{BB} bivariate distribution was obtained from S_{B} marginal distributions using a Normal Copula based on a four-parameter logistic transformation. The Plackett Copula was used to obtain the bivariate models from the Weibull and Logit-logistic univariate marginal distributions. The negative logarithm of the maximum likelihood function was used to compare the results and the Wilcoxon signed-rank test was used to compare the related samples of these logarithms calculated for each sample plot and each distribution. Main results: The best results were obtained by using the Plackett copula and the best marginal distribution was the Logit-logistic. Research highlights: The copulas used in this study have shown a good performance for modeling the joint distribution of tree diameters and heights. They could be easily extended for modelling multivariate distributions involving other tree variables, such as tree volume or biomass. Keywords: Plackett copula; normal copula; Eucalyptus globulus. Citation: Gorgoso-Varela, J.J., García-Villabrille, J.D., Rojo-Alboreca, A., Gadow, K.v., Álvarez-González, J.G. (2016). Comparing Johnson’s S_{BB}, Weibull and Logit-Logistic bivariate distributions for modeling tree diameters and heights using copulas. Forest Systems, Volume 25, Issue 1, eSC07. http://dx.doi.org/10.5424/fs/2016251-08487. Received: 18 Aug 2015. Accepted: 21 Dec 2015 Copyright © 2016 INIA. This is an open access article distributed under the Creative Commons Attribution License (CC by 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: Ministerio de Ciencia e Innovación (project AGL2010-22308-C02-01), European Union ERDF programme (2011-2013), and the Xunta de Galicia. Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Juan G. Álvarez-González: juangabriel.alvarez@usc.es |
CONTENTS |
IntroductionTop
Stand volume, one of the most important variables in forest management, is usually estimated based on sampled tree diameters and heights (Wang & Rennolls, 2007). The common practice is to obtain the height data from a subsample of trees for which diameters are available, and to fit an empirical height-diameter relationship to estimate the average height per diameter class. Tree volume is then estimated using an individual-tree volume equation. Although this approach may appear satisfactory, it is often not appropriate because one tends to ignore the fact that height may vary considerably for a given diameter due to genetic, environmental or silvicultural factors. Therefore, the height residuals are seldom homoscedastic and normally distributed and in many forests the variance about the diameter-height regression is heterogeneous (Zucchini et al., 2001).
An alternative approach for improving stand volume estimation, which takes into account those variations, involves the use of a bivariate distribution (Zucchini et al., 2001; Wang et al., 2008; Mønness, 2015). The joint bivariate distribution of tree diameter and height provides a detailed impression of the relationship between the two variables, which is not given by the two marginal distributions (Rupsys & Petrauskas, 2010). Moreover, bivariate distributions of diameter and height are also useful for assessing timber value based on price sizes (Schreuder & Hafley, 1977) and stand structural diversity (Staudhammer & LeMay, 2001).
Hence there has been considerable interest in identifying suitable bivariate distributions to describe diameter-height frequency data. For many years, the bivariate extension of the S_{B} distribution, the S_{BB} (Johnson, 1949), has been the only bivariate distribution used for modeling bivariate tree diameter-height frequency data (e.g. Hafley & Schreuder, 1977; Knoebel & Burkhart, 1991; Tewari & Gadow, 1999; Castedo Dorado et al., 2001; Zucchini et al., 2001). Johnson’s S_{BB} is developed by applying a four-parameter logistic transformation to each of the component variables of a standard bivariate normal distribution (Johnson, 1949; Rennolls & Wang, 2005). The construction of any other analytic bivariate distribution without resorting to a transformation of a bivariate normal distribution is complicated (Wang & Rennolls, 2007). However, the use of a copula function has provided a general way of constructing multivariate distributions. During recent years, several authors made use of the approach described by Sklar (1973) joining a multivariate distribution based on their one dimensional marginal distributions (e.g. Li et al., 2002; Wang & Rennolls, 2007; Wang et al., 2008).
The objective of the present study is to fit and compare the accuracy of three bivariate distributions: Johnson’s S_{BB}, Weibull and Logit-Logistic fitted to diameter-height data from pure and even-aged stands of Eucalyptus globulus in Northwestern Spain. The Weibull and the Logit-Logistic (LL) bivariate distributions, denoted as Weibull-2^{P} and LL-2^{P}, were obtained from Weibull and LL marginal distributions by using the Plackett copula whereas the S_{BB} bivariate distribution was obtained from S_{B} marginal distributions using the Normal copula, i.e., a four-parameter logistic transformation to each of the component variables.
Material and methodsTop
Data
All tree diameters and heights were measured in 128 field plots in Tasmanian blue gum(Eucalyptus globulus Labill.) stands in Galicia. The plots had been re-measured 1, 2 or 3 times resulting in a total of 308 inventories. The plots were established in pure and even-aged stands covering a wide variety of combinations of age, number of trees per hectare, site quality and method of regeneration. The sample plot size ranged from 375 to 900 m^{2}, depending on stand density. The objective was to assess a minimum of 30 trees per plot.
All trees in each plot were numbered; diameters at breast height were measured with a caliper, to the nearest 0.1 cm, and heights were measured with hypsometer to the nearest 0.1 m. The stand variables calculated in each inventory included the quadratic mean diameter, the number of trees per hectare, dominant height, basal area and mean height. A total of 17,588 trees were measured. The summary statistics of the main stand variables are presented in Table 1.
Table 1. Summary of the main descriptive statistics of stand variables.
Copula functions
Wang et al. (2008) presented an exhaustive review of different one-parameter copulas which are useful for modeling bivariate tree diameter and height distributions. A copula is a function that joins a multivariate distribution function based on its one-dimensional marginal distributions. Suppose X and Y are two continuous random variables and F(x) = Pr(X ≤ x) and G(y) = Pr(Y ≤ y) are their marginal cumulative distribution functions, respectively. The copula function C combines these two marginal to give the joint distribution function H(x, y) as H(x, y) = C(F(x), G(y)) . If both marginal distribution functions and the copula are differentiable, the joint density function can be expressed as:
where f(x) and g(y) are the marginal density functions, and is the used copula density.
Frequently used copulas are the Normal (Mardia, 1970) and the Plackett copula (Plackett, 1965). Their densities are given by (Wang et al., 2008):
Plackett copula
where z_{x} and z_{y} are specific transformations of x and y, respectively and ω, defined as the cross-product ratio or odds-ratio, is a positive constant for all (x,y) for which neither F nor G assumes the value 0 or 1; ρ is a measure of the degree of association.
Fitting the SBB, Weibull-2P and Logit-logistic (LL-2^{P}) bivariate distributions
The S_{BB} distribution was obtained from S_{B} marginal distributions using the normal copula. In this case, the variables x and y were defined as: x = (D — ε_{1})/λ_{1} and y = (H — ε_{2})/λ_{2} where ε_{1} and ε_{2} are the location parameters and λ_{1} and λ_{2} are the observed ranges of diameter (D) and height (H), respectively. The values of z_{x} and z_{y} were obtained from a four-parameter logistic transformation of x and y z_{x} = γ_{1} + δ_{1} log(x/1 — x) and z_{y} = γ_{2} + δ_{2} log(y/1 — y). These variables have a joint normal bivariate distribution with correlation ρ:
The parameters ε were predetermined as d_{min}-0.5 and h_{min}-0.5 for diameter and height, respectively, whereas the parameter λ was set equal to the range of diameters and the range of heights plus one, for the two marginal distributions, respectively.
The Weibull-2^{P} and the LL-2^{P} bivariate distributionswere obtained using the Plackett copula and the marginal Weibull and Logit-Logistic density functions:
Logit-logistic density function
where x is the diameter (D), y is the height (H), ε is the location parameter, b and c are the scale and shape parameters of the Weibull distribution, with b, c > 0; λ is the scale parameter and μ and σ are the shape parameters of the Logit-logistic distribution, with ε < x < ε + λ; - ∞ < ε < ∞; - ∞ < μ < ∞; λ > 0; σ > 0.
The parameters were estimated by minimizing the negative log-likelihood function of equations (1) for Weibull-2^{P} and LL-2^{P} and (4) for S_{BB} using the R function optim (R Core Team, 2014). Assuming that the sample observations are independent with identical distributions, the negative log-likelihood function is the sum of single-tree terms (Wang & Rennolls, 2007). Both univariate distributions considered in this study to develop bivariate distributions using the Plackett copula (Weibull-2^{P} and LL-2^{P}) have a closed form of their cumulative distributions. If this is not the case, numerical methods should be used for evaluating the cumulative distribution in the model-fitting process.
Comparing the bivariate distributions and goodness-of-fit
Each bivariate model considered in this study has the same number of parameters, namely five: two specific parameters for each marginal distribution and one common parameter. Thus, the parameter values were used as goodness-of-fit criteria for comparison. The Wilcoxon signed-rank test was used to compare the related samples of the negative log-likelihood function calculated for each sample plot and each of the three distributions. This is a non-parametric paired difference test to assess whether the population mean ranks differ when the population cannot be assumed to be normally distributed.
Results and discussionTop
The means, maxima, minima and standard deviations of the estimated parameters for the three bivariate distributions (bivariate Johnson’s S_{BB}, Weibull-2^{P} and LL-2^{P}) are presented in table 2. The maximum likelihood estimation converged for all plots and for all three bivariate distributions. In a study in Chinese fir plantations (Cunninghamia lanceolata Lamb.) a number of sample plots did not converge for the LL-2^{P} bivariate and bivariate beta distributions, probably due to these plots having J-shaped marginal distributions (Wang & Rennolls, 2007). Our good results could be due to the fact that all sample plots were installed in even-aged forests.
Table 2. Mean values, maximum, minimum and standard deviation of the parameters for the three bivariate distributions compared.
Table 3 presents the between-model comparative performance of the three bivariate distributions in terms of their goodness-of-fit statistics and the Wilcoxon rank test. The best results were obtained with the Logit-logistic marginal distribution combined with the Plackett copula. Wang et al. (2008), in a study for Chinese fir plantations, comparing five different copulas including the Normal and the Plackett, found that the normal copula showed the best results. In this study, we cannot compare directly the copulas because we are using different marginal distributions with each copula. Moreover, as the authors pointed out, the age range of the Chinese fir plantations used in their study was very limited and older stands had different structures influencing the observed outcomes.
Table 3. Between-model comparative performance of these three models in terms of their goodness-of-fit statistics and the Wilcoxon test. Ratio is the proportion of cases in which the row distribution model had a lower value of the negative log-likelihood function than the column distribution.
The S_{BB} distribution showed better results in terms of goodness-of-fit statistics than the Weibull distribution (Table 3), although the differences were not significant. The better performance of LL-2^{P} over S_{BB} and Weibull was expected since the logit-logistic univariate distribution is more flexible than the other two, covering a wide range of skewness-kurtosis combinations. However, the good results of the Weibull distribution were unexpected, because the Weibull univariate turned out to be the least flexible of the three univariate distributions used. The reason for this may be the very regular shape of the marginal diameter and height distributions of our even-aged stands. Moreover, it also should be taken into account that the locations (ε) and the ranges (λ) of diameters (D) and heights (H) were fixed, affecting especially the performance of LL-2^{P} and S_{BB} bivariate distributions.
Both the normal and the Plackett copulas have shown a good performance for modeling the joint distribution of tree diameters and heights. They could be easily extended for modelling multivariate distributions involving other tree variables. However, it should be noted that the normal copula, in general, does not have a closed form for its joint density, except for the Normal or Johnson’s marginal distributions. Another point to consider is the fact that the Plackett copula requires that the marginal has a closed form for its cumulative distribution (F (x) and G( y) in equation (3)), to avoid numerical methods for evaluating the cumulative distribution in the model-fitting process.
ReferencesTop
○ | Castedo-Dorado F, Ruiz-González AD, Álvarez-González JG, 2001. Modelización de la relación altura-diámetro para Pinus pinaster Ait. en Galicia mediante la función de densidad bivariante S_{BB}. Invest. Agrar Sist Recur For 10(1): 111-125. |
○ | Hafley WL, Schreuder HT, 1977. Statistical distributions for fitting diameter and height data in even-aged stands. Can J For Res 7(3): 481-487. http://dx.doi.org/10.1139/x77-062 |
○ | Johnson NL, 1949. Bivariate distributions based on simple translation systems. Biometrika 36: 297-304. http://dx.doi.org/10.1093/biomet/36.3-4.297 |
○ | Knoebel BR, Burkhart HE, 1991. A bivariate distribution approach to modelling forest diameter distributions at two points of time. Biometrics 47: 241-253. http://dx.doi.org/10.2307/2532509 |
○ | Li F, Zhang L, Davis CJ, 2002. Modeling the joint distribution of tree diameters and heights by bivariate generalized Beta distribution. For Sci 48(1): 47-58. |
○ | Mardia KV, 1970. Families of bivariate distributions. Griffin, London, UK. 231 pp. |
○ | Mønness E, 2015. The bivariate power-normal distribution and the bivariate Johnson system bounded distribution in forestry, including height curves. Can J For Res 45(3): 307-313. http://dx.doi.org/10.1139/cjfr-2014-0333 |
○ | Plackett RL, 1965. A class of bivariate distributions. J Am Stat Assoc 60: 516-522. http://dx.doi.org/10.1080/01621459.1965.10480807 |
○ | R Core Team, 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. |
○ | Rennolls K, Wang M, 2005. A new parameterization of Johnson’s SB distribution with application to fitting forest tree diameter data. Can J For Res 35(3): 575-579. http://dx.doi.org/10.1139/x05-006 |
○ | Rupsys P, Petrauskas E, 2010. The Bivariate Gompertz Diffusion Model for Tree Diameter and Height Distribution. For Sci 56(3): 271-280. |
○ | Schreuder HT, Hafley WL, 1977. A useful bivariate distribution for describing stand structure of tree heights and diameters. Biometrics 33: 471-478. http://dx.doi.org/10.2307/2529361 |
○ | Sklar A, 1973. Random variables, joint distribution functions and copulas. Kybernetika 9: 449-460. |
○ | Staudhammer CL, LeMay VM, 2001. Introduction and evaluation of possible indices of stand structural diversity. Can J For Res 31: 1105-1115. http://dx.doi.org/10.1139/x01-033 |
○ | Tewari VP, Gadow Kv, 1999. Modelling the relationship between tree diameters and heights using S_{BB} distribution. For Ecol Manage 119: 171-176. |
○ | Wang M, Rennolls K, 2007. Bivariate Distribution Modeling with Tree Diameter and Height Data. For Sci 53(1): 16-24. |
○ | Wang M, Rennolls K, Tang S, 2008. Bivariate Distribution Modeling of Tree Diameters and Heights: Dependency Modeling Using Copulas. For Sci 54(3): 284-293. |
○ | Zucchini W, Schmidt M., Gadow Kv, 2001. A model for the diameter-height distribution in an uneven-aged beech forest and a method to assess the fit of such models. Silva Fenn 35(2): 169-183. http://dx.doi.org/10.14214/sf.594 |
Webpage: www.inia.es/Forestsystems