Evaluation of direct and indirect methods for modelling the joint distribution of tree diameter and height data with the bivariate Johnson ’ s SBB function to forest stands

Aim of study: In this study, both the direct and indirect methods by conditional maximum likelihood (CML) and moments for fitting Johnson’s SBB were evaluated. To date, Johnson’s SBB has been fitted by either indirect (two-stage) method using well-known procedures for the marginal diameter and heights, or direct methods, where all parameters are estimated at once. Application of bivariate Johnson’s SBB for predicting height and improving volume estimation requires a suitable fitting method. Area of study: E. globulus, P. pinaster and P. radiata stands in northwest Spain. Material and methods: The data set comprised of 308, 184 and 96 permanent sample plots (PSPs) from the aforementioned species. The suitability of the method was evaluated based on height and volume prediction. Indices including coefficient of determination (R2), root mean square Error (RMSE), model efficiency (MEF), Bayesian Information Criterion (BIC) and Hannan-Quinn Criterion (HQC) were used to assess the model predictions. Significant difference between observed and predicted tree height and volumes were tested using paired sample t-test at 5% level for each plot by species. Main results: The indirect method by CML was the most suitable method for height and volume prediction in the three species. The R2 and RMSE for height prediction ranged from 0.994 – 0.820 and 1.454 – 1.676, respectively. The percentage of plot in which the observed and predicted heights were significant was 0.32%. The direct method was the least performed method especially for height prediction in E. globulus. Research highlights: The indirect (two-stage) method, especially by conditional maximum likelihood, was the most suitable method for the bivariate Johnson’s SBB distribution. Additional keywords: conditional maximum likelihood; moments; two-stage method; direct method; tree volume. Authors’ contributions: JJG: conceptualized the study, provided the background of the research and fitted the indirect method of the Johnson’s SBB; FNO: fitted the direct method and wrote the manuscript; RAP: revised, edited and did the final manuscript formatting. Citation: Gorgoso-Varela, J.J., Ogana, F.N., Alonso Ponce, R. (2019). Evaluation of direct and indirect methods for modelling the joint distribution of tree diameter and height data with the bivariate Johnson’s SBB function to forest stands. Forest Systems, Volume 28, Issue 1, e004. https://doi.org/10.5424/fs/2019281-14104 Supplementary material: Figures S1 to S3 accompany the paper on FS’s website. Received: 18 Oct 2018. Accepted:17 Apr 2019. Copyright © 2019 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC-by 4.0) License. Funding: This work was supported by the Government of Spain, Department of Economy, Industry and Competitiveness under the Torres Quevedo Contract PTQ-16-08445. The study also was financially supported by the Gobierno del Principado de Asturias through the project entitled “Estudio del crecimiento y producción de Pinus pinaster Ait. en Asturias” (CN-07-094); by the Ministerio de Ciencia e Innovación through the project entitled “Influencia de los tratamientos selvícolas de claras en la producción, estabilidad mecánica y riesgo de incendios forestales en masas de Pinus radiata D. Don y Pinus pinaster Ait. en el Noroeste de España” (AGL2008-02259), and an ongoing research project entitled “Growth and yield modelling of clonal and seedling plantations of Eucalyptus globulus Labill. of NW Spain” (code AGL2010-22308-C02-01), funded by the Ministry of Science and Innovation of Spain and the European Union through the ERDF programme for the period 2011-2013. Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Friday Nwabueze Ogana: fn.ogana@ui.edu.ng


Introduction
Researchers realize that volume, the primary variable that forest managers are interested in, is heavily dependent on both tree diameter and height.A traditional practice is to fit a marginal distribution to the diameter frequency data and then use an empirical heightdiameter relationship to estimate the average height per diameter class and hence the volume (Clutter & Al li son, 1974).Although this approach is satis fac to ry in many situations, it is often not appropriate because the approach tends to ignore the natural relationships between tree diameters and heights by treating them separately (Schreuder & Hafley, 1977;Tewari & von Gadow, 1999).
The traditional method does not quantify the distribution of heights for a given diameter and one approach for modelling the conditional height distribution for the different diameters is to use the height residuals (Gaffrey, 1996).However, it is very seldom that the height residuals are homoscedastic and normally distributed.In most forest stands the variance about the diameter-height regression is not homogeneous (Zucchini et al., 2001).
One of the most important elements of forest structure is the relationship between tree diameters and heights because information about size-class distributions of the trees within a forest stand is important for estimating product yields (Zucchini et al., 2001).The size-class distribution influences the growth potential and hence the current and future economic value of a forest stand (Knoebel & Burkhart, 1991).On the other hand, the social status of a tree, which reflects its further development in the aspects of growth and mor ta li ty, depends not only on its relative diameter, but also on its relative height in a stand (Siipilehto, 2000).Furthermore, knowledge of the height variation, both between and within diameter classes, improves the chances of successfully imitating different types of thinnings (Hafley & Buford, 1985).
The parameters of the Johnson's S BB distribution have been estimated with both the indirect and direct methods.The indirect method fits the bivariate Johnson's S BB distribution by two-stage where the marginals are first fitted separately for the diameter and height; the estimates are then used to compute the parameter of association.Examples of the indirect method that have been reported in forestry literature include Conditional Maximum Likelihood (CML), moments, mode and Knoebel and Burkhart methods (Hafley & Buford, 1985;Tewari & von Gadow, 1997;Knoebel & Burkhart, 1991;Siipilehto, 1996;Tewari & von Gadow, 1999;Cas tedo-Dorado et al., 2001;Li et al., 2002;Oga na, 2018a).Of these indirect methods, CML and moments ha ve been reported as the best methods (Ogana, 2018a).In the direct method, all the parameters in the bivariate dis tri bu tion are estimated simultaneously (Wang, 2005;Mønness, 2015;Gorgoso-Varela et al., 2016;Oga na et al., 2018).However, these two methods (i.e., direct and indirect) of estimating the parameters of Johnson's S BB distribution have never been compared using the same sample.The method used to calibrate the distribution matters in order to evaluate the accuracy of diameter and height predictions (Ogana, 2018a).Thus, the objective of the study was to compa re the indirect fitting method for the Johnson's S B based on two-stage method by the Conditional Maximum Likelihood and moments methods with the direct method where the parameters are estimated simultaneously.

Data set
The data used for this study were obtained from three temperate species -Tasmanian blue gum (Eucalyptus globulus Labill) and two species of pine -Maritime pine (Pinus pinaster Ait) and Monterrey pine (Pinus radiata D. Don) in Spain.E. globulus stands occupy 320,774 ha in Galicia.The pure stands of P. pinaster cover 217,281 ha and 22,523 ha in the regions of Galicia and Asturias, respectively.The plantations of P. radiata occupy 96,177 ha and 25,385 ha in Galicia and Asturias, respectively (MMAMRM, 2011).A total of 308 permanent sample plots (PSPs) from E. globulus, 184 PSPs from P. pinaster stands and 96 PSPs from P. radia ta stan ds were used for this study.The plot sizes ran ged from 375 to 900 m 2 ; to achieve a minimum of 30 trees per plot.Diameter at breast height (Dbh at 1.3 m above the ground) and total height were measured with calliper and hypsometer to a precision of the nearest 0.1 cm and 0.1 m, respectively.A total of 16382, 17845 and 12722 trees were measured from E. globulus, P. pinaster and P. radiata, respectively.The descriptive statistics of the inventory data are presented in Table 1.

The Johnson's Univariate (S B ) and Bivariate
The univariate Johnson's S B distribution (Johnson, 1949a) is expressed as: Eq. ( 1) Where: = location parameter, λ = scale parameter and γ, δ = shape parameters (asymmetry and kurtosis parameters, respectively).The Johnson's S BB (Johnson, 1949b) is the extension of the univariate Johnson's S B distribution; given by: Eq. ( 2) Where: Eq. (3) Eq. (4) Eq. ( 5) Where: ρ is the correlation coefficient between Z d and Z h , the subscripts d and h in equations represent diameter and height, respectively.

Fitting Methods
Indirect method: this method fits the bivariate Johnson's S BB distribution by a two-stage approach where the marginals are first fitted separately for the diameter and height.The estimates from the first stage are then used to compute the correlation parameter (as shown in Eq 3).Generally, fitting Johnson's S BB distribution requires the location (ξ) and scale (λ) to be predetermined (related to minimum and range of the tree variable) while δ and γ can be analytically deduced through different approaches (Schreuder & Hafley, 1977).The method of CML and moments were the indirect methods considered in this study as recommended by Ogana (2018a).
Conditional Maximum Likelihood (CML): The valu es of the parameters were obtained with these expressions: Eq. ( 6) Eq. ( 7) Eq. ( 8) Eq. ( 9) Eq. ( 10) Where: x i (i = 1, 2, …, n) = tree diameters and heights.The location (ξ) and scale (λ) parameters were set to equal minimum diameter times 0.75 (0.75*D min ) and maximum diameter, respectively for the marginal distribution of diameter.These parameters were set at 1.3 and maximum height for the marginal height distribution.The factor 0.75 was adopted because Gorgoso et al.
Method of Moments: this method is based on the functional relationship between the first and second moments of diameter and height and the parameters of the S BB distribution.Ogana (2018a) recently used this method to fit the S BB distribution.It is given by: Eq. ( 11) Eq. ( 12) Eq. ( 13) Eq. ( 14) Where: X = mean diameters and heights of the plot, σ x = plot diameter and height standard deviations and Sd(x) = modified standard deviation.The same procedure with CML was used for λ and ξ.
Direct method: in this method, all parameters in the bivariate Johnson's S BB distribution were estimated simultaneously using maximum likelihood estimation.However, the location (ξ) and scale (λ) parameters were predetermined with the same procedure as the indirect method.This method was applied by Wang (2005), Wang et al. (2008) and Gorgoso-Varela et al. (2016).These authors used the normal copula for this distribution.The joint density of the S BB function derived from the normal copula is given by: Eq. ( 15) and the likelihood function is expressed as: Eq. ( 16) where f(d) and g(h) are marginal Johnson SB densities for diameter (d) and height (h), respectively defined in equation 1; θ parameters to the estimated and N is the number of observations.
The indirect method (CML and moments) estimations were obtained using SAS/STAT TM software (SAS Institute 2003).The computation of the direct method was carried out using the 'optim' (optimal) function in R (R Core Team, 2017).The Nelder-Mead sim plex algorithm (Nelder & Mead, 1965) was used for the 'optim'.

Model Evaluation
The suitability of the direct and indirect methods for fitting S BB distribution was evaluated by tree height and volume predictions.Prediction of expected tree height given diameter and the estimation of stand volume are the main applications of bivariate distribution.Individual tree heights were predicted with the S BB median regression expressed as: Eq. ( 17 Where: v = total volume of the stem with bark (m 3 ); d = tree diameter (cm) and h = tree height (m).The root mean square error reported by the authors for equation 18, 19 and 20 were 0.0116, 0.0156 and 0.0131, respectively.
Where: ̂̂̂̂̂ Different fit indices including coefficient of determination (R 2 ), root mean square error (RMSE), model efficiency (MEF) (Mayer & Butler, 1993), Bayesian Information Criterion (BIC) and Hannan-Quinn Criterion (HQC) were used to assess the adequacy of the methods for height and volume predictions.In addition, significant difference between observed and predicted tree heights and volumes were tested using paired sample t-test at 5% level for each plot by species.Plot was rejected if significant difference occurs between observed height/volume and predicted height/volume for that plot at p<0.05.Pairwise comparison was done with t-test because outlier was not detected in the data.
Eq. ( 21) Eq. ( 22) Eq. ( 23) Eq. ( 24) Eq. ( 25) Where: RSS = residual sum of square, n = sample size, p = number of parameters; Y i = average tree height or volume; Y i is the observed value and Y i is the theoretical value predicted by the model.

Result
The descriptive statistics i.e., the mean, maximum, minimum and standard deviation of the estimated parameters of the bivariate Johnson's S BB distribution are presented in Table 2.The location parameter of the marginal height distribution was set to 1.3 for both direct and indirect methods across the three species.The shape, scale and rho (ρ) parameters of the indirect methods (CML and moments) show little variations.In fact, the standard deviations of the parameters' values were the same at least up to 1 decimal place.This indicates that both CML and moments are closely related.However, estimates from the direct method were different.The standard deviations of the estimated parameters are smaller than that of CML and moments of the indirect method.
The assessment of the suitability of indirect (CML and moments) and direct methods for tree height prediction showed that the indirect method, especially the CML method, had the highest R 2 and the lowest RMSE, MEF, BIC and HQC in E. globulus, P. pinas ter and P. radiata (shown in Table 3).The R 2 , RMSE, MEF, BIC and HQC of CML ranged from 0.994 -0.820, 1.454 -1.676, 0.055 -0.179, 4559 -13400 and 4492 -13335, respectively.However, the direct method had relatively lower R 2 and larger RMSE, MEF, BIC and HQC in E. globulus, P. pinaster and P. radiata relative to the indirect methods.The number of plot in which the observed and predicted values were significant at 5% level (t-test) was 1 out of 308 plots (equivalent to 0.32%) in E. globulus for CML.Eight (2.59%) and 14 (4.45%) plots were significant for moments and the direct methods, respectively in E. globulus.There was no significant difference between the observed and predicted tree height by CML and moments across the 184 and 96 plots in P. pinaster and P. radiata, respectively.But 4 and 5 plots were significant with the direct method in P. pinaster and P. radiata, respecti vely at the specified level.
In the case of tree volume prediction, indirect (CML and moments) and direct methods performed relatively the same (Table 4).The methods had the same R 2 , RMSE and MEF values up to 2 decimal places.The direct method had lowest BIC and HQC in E. globulus and P. radiata.The number of plot in which the observed and predicted values was significant at 5% level for CML was like the result in tree height prediction (i.e., 1, 0, 0).However, 6, 6, and 3 plots were significant in E. globulus, P. pinaster and P. radiata, respectively for the direct method.
The mean of the residuals from the height-diame ter median regression was computed across DBH cla sses and plotted accordingly for the three species.The data were group into DBH classes of 5 cm interval and the mean residual prediction for each class by species was assessed.The graphs showed that CML and moments over-and under-estimated the tree height in the larger diameter classes (> 42.5 cm) in E. globulus and P. pinaster (Fig. 1a to c).Nevertheless, their prediction errors lie within the range of -0.5 to 0.5, except in the larger DBH classes.Conversely, the direct method did not only over-and under-estimate tree height in the larger diameter classes but also overestimate tree heights in the lower diameter classes in E. globulus.The mean residual height prediction for both direct and indirect method were relatively poor in P. radiata.
Furthermore, the graph of residual against predicted tree volume showed that CML and moments and the direct methods occupied the same horizontal band (Fig. 2).Their mean residuals plot lied within the range of -0.6 to 0.6, -0.6 to 0.6 and -0.4 to 0.2 in E. globulus, P. pinaster and P. randiata, respectively.Such plot is ̂ Table 2. Descriptive statistic of the Johnson's S BB parameters for the three species.used to assess whether the residual is homoscedastic (i.e.constant variance) or heteroscedastic.The residual of the methods assessed were relatively constant.

Discussion
The direct and indirect methods of fitting Johnson's S BB have been evaluated.The methods follow similar trend across E. globulus, P. pinaster, and P. radiata with CML being the best for tree height and volume predictions.The number of plots rejected by t-test increased from CML to moments and to the direct method.The percentage of rejections in E. globulus was higher than the other species because some plots of coppice stands were considered, in which the minimum diameter considered was less than 5 cm (value of the minimum diameter considered for Pinus pinaster and Pinus radiata stands).There were also biases in the larger diameter classes (> 42.5 cm) for all methods.Kalbi et al. (2017) attributed this bias to the small number of trees in the larger diameter class.Few trees were found in the diameter classes > 42 cm across the three species.The Johnson's height-diameter function often perform well if parameter ϕ i.e., the determinant of the regression curve is greater than one (Tewari & von Gadow, 1999).In this study, ϕ had values greater than one in some of the sample plots for CML, moments and direct methods.The model performed relatively well in these plots.
Furthermore, the values of the BIC and HQC difference show that the method of CML fits the diameterheight data better than direct methods for height prediction in E. globulus, P. pinaster and P. radiata.As a rule of thumb, a minimum ∆AIC/∆BIC/∆HQC of ≤ 2 is required for two models to be indistinguishable (Ogana, 2018b;Tewari & Singh, 2018).Though, the measures of the accuracy of the estimates (RMSE) and the relative measures of the model performance (MEF) look quite similar for all methods, the CML seems to be slightly better.Parallel observation was reported in   Ogana (2018a), who evaluated the performance of CML, moments, Knoebel and Burkhart (KB) and mode methods for height and volume predictions.The author found CML and moment to be the best methods for predicting tree height and volume in E. camaldulensis Dehn.
It is obvious from this study that when a complex fitting approach does not outperform another simpler one it is convenient to take the latter.Estimating the parameters of Johnson's S BB by CML or moments is much easier and pose little computational difficulty.The CML and moments estimation procedure can even be carried out on Microsoft excel platform.However, the direct method involves complicated algorithm using maximum likelihood technique.This often requires writing the log-likelihood function and the specification of initial values for the parameters which may not achieve convergence (Wang, 2005).Given the complexity of the likelihood function, its gradient is not tractable and then numerical optimization methods have to be used (i.e. through the 'optim' function), which may be less efficient than those based on derivatives.The computation procedure gets more complex as the number of parameters in the distribution increases.Bivariate Johnson's S BB has nine parameters which makes fitting somewhat complicated.Often constraints are imposed on the two location and scale parameters of the bivariate distribution to make the estimates more plausible (Wang & Rennolls, 2007).Mønness (2015) used maximum likelihood estimation to fit the Johnson's S BB distribution and compared the result with power-normal and hyperbolic height prediction model.The prediction from power-normal and hyperbolic models were better than bivariate Johnson S BB fitted with maximum likelihood.Other studies that have used maximum likelihood technique to fit the bivaria te Johnson's S BB distributions include Wang et al. (2008), Gorgoso-Varela et al. (2016) and Ogana et al. (2018).The bivariate Johnson's S BB height-diameter median regression remains the most applied function amidst several bivariate distributions in forestry literature.This is because of its flexibility and, more importantly, because its implied relationship between height and diameter is biologically reasonable (Burkhart & Tome, 2012).The closed-form median regression of the Johnson's S BB height-diameter model has also enhanced its continuous application to forestry.This median regression is frequently used for computing percentile lines wherein bounds on height are set.These percentile lines can be used to show how the variation in height decreases with increasing tree size for a specified diameter (Tewari & von Gadow, 1999).The scatter plots of the Johnson's S BB me dian regression using indirect method fitted by CML from some sample plots in E. globulus, P. pinaster and P. radiata are presen ted in Figures S1 to S3

Conclusion
Modelling the joint distribution of diameter and height by Johnson S BB remains an important tool for assessing the variation of tree height for a given diameter, detailed stand structure and volume estimation.Its accuracy is affected by the method of estimating the parameters of the distribution.In this study, we found the indirect (two-stage) method, especially by conditional maximum likelihood, to be the most suitable method for the bivariate Johnson's S BB distribution.This method predicted tree height and volume in E. globulus, P. pinaster and P. radiata better than the direct method.
(2012) used 0.75 D min for this parameter ξ to achieved good result compared to other cons tra in ts in Pinus pinaster and Pinus radiata.The same species from the same region were used in the study at hand.The CML has been applied to fit the Johnson's S BB by di ffe rent authors, including Knoebel & Burkhart , min = minimum, SD = standard deviation, Skw = skewness, Kurt = kurtosis, d = tree diameter at 1.30 m, h = total height.
) are the estimated parameters from the direct and indirect methods.h = total tree height (m) and d = tree diameter at 1.30 m (cm).The ϕ parameter determines the shape of the regression curve while ρ influences the slope.The relationship can only be linear if ρδ d = δ h and ργ d = γ h .Individual tree volume equation developed by García-Villabrille (2015) for E. globulus and Diéguez-Aranda et al. (2009) for P. pinaster and P. radiata were used to obtain observed and predicted tree volume.The procedure of Li et al. (2002) and Ogana (2018a) was used in this study.Observed tree volume was computed by substituting the observed diameter and height into the individual tree volume equations.The predicted individual tree volumes were obtained from the observed diameter and predicted tree height by S BB height-diameter median regression using direct and indirect methods: E. globulus (García-Villabrille, 2015): Eq. (18) v=2.993 x 10 -5 d 1.973 h 1.059 P. pinaster (Diéguez-Aranda et al., 2009): Eq. (19) v=3.974 x 10 -5 d 1.876 h 1.079 P. radiata (Diéguez-Aranda et al., 2009): Eq. (20) v=4.851 x 10 -5 d 1.883 h 1.004

Figure 1 .
Figure 1.Mean height residual against diameter at 1.30 m of CML, moments and direct methods in (a) E. globulus, (b) P. pinaster and (c) P. radiata. the Johnson's S BB relative to other bivariate functions evaluated in their study.The bivariate Johnson's S BB height-diameter median regression remains the most applied function amidst several bivariate distributions in forestry literature.This is because of its flexibility and, more importantly, because its implied relationship between height and diameter is biologically reasonable(Burkhart & Tome,  2012).The closed-form median regression of the Johnson's S BB height-diameter model has also enhanced its continuous application to forestry.This median regression is frequently used for computing percentile lines wherein bounds on height are set.These percentile lines can be used to show how the variation in height [suppl.].

Figure 2 .
Figure 2. Residual against predicted volume of CML, moments and direct methods across the three species.

Table 1 .
Summary statistics of the tree variables.

Table 3 .
Fit indices and number of plot rejection by t-test at α = 0.05 for height prediction.

Table 4 .
Fit indices and number of plot rejection by t-test at α = 0.05 for volume prediction.