Sustainable management of the forests require accurate tools for forecasting the different ecosystems services provided under different management schedules, and different environmental and economical scenarios. While growth and timber production models have experienced a huge advance in the last years for the main part of the Mediterranean forest species (see e.g.

Pine nuts from stone pine (

Several tools are nowadays available aiming to carry out predictions of cone and pine nut production at different spatial and temporal scales. While the first efforts in modelling cone production were oriented to develop stand level models (

Despite the wide use of this model, severe limitations are identified. The first one is to the narrow spatial range of applicability of the model, limited to the Northern Plateau region in inner Spain (60,000ha). Direct application of the model to other relevant stone pine regions (such as Andalusia or the Central Range) has resulted in biased predictions, thus specific models for these regions are highly required.

A second main limitation is that the model was constructed using a limited series of data covering ten years (1996-2005), and then validated over an extended series of three additional years (2006-2008). Besides this, due to the absence of an independent series for validation, available data set was split out into a fitting and a validating data set, which resulted in a significant reduction in the size of the fitting data set. The accuracy of the model was then successfully evaluated over different years out of the initial temporal series. However, the constructed model has failed to predict the observed yield decline in the three most recent crops (2012-2013, 2013-2014 and 2014-2015) in the inner part of Spain, a process common to other regions (

Whatever the main cause is, our current model tend to overestimate the production in recent years, thus the construction of a new model using an extended series of cone production covering up to the most recent crops will give insight on the potential cause for this decrease on the production.

Another factor that might be taken into account for modeling the masting pattern of stone pine, additionally to the predominance of drought-prone climatic covariates (

In relation to the original formulation and structure of the model, it included as explanatory covariates stand age and site index, which prevents the sound application of the model in multi-aged stands, where these attributes are not interesting. Apart from this, while common management inventories on even-aged stands may include an estimate of stand age (which can be then applied to compute site index), regional or national scale inventories, as National Forest Inventory, do not include such information, thus limiting the potential use of the model in predicting cone production at these extended spatial scales. Additionally, the annual model does not produce period interval predictions of cone production. Due to this, it is difficult to couple the model with either the existing distance independent tree-level model for the species PINEA2 or the associated stand-level simulator, which makes out predictions at five years steps.

In consequence, the main aim of the paper is to present enhanced and new tools for forecasting cone production in

To build up an enhanced version of the model developed by

To build up a new model for annual tree cone production with validity in the forests of the Central Range, sharing similar characteristics and structure as those previously described.

To present an independent validation on the constructed models based on an approach for upscaling predictions from tree to forest level.

Our main hypothesis are that: (i) the observed recent decay in the cone production can be explained by including climate factors not adequately considered in the previous versions; (ii) the consideration in the model of the delayed inhibitory effect of large cone crops will improve the annual estimates of cone productions; and (iii) the processes controlling cone production in both regions are similar, thus main regional differences are related with the timing (phenology) intensity of the factors rather than on genetic differences.

The NP region is a plain defined by the basin of Duero river, owing two main differentiated units: sandy areas, at an average altitude of 700 – 750 m and limestone plains, at an altitude over 800-850 m. Within the NP ^{th} century, aiming to guarantee soil protection and optimize both cone and timber production, mainly resulting in pure, even-aged stands. However, more complex forests, showing uneven aged structures and mixtures with other conifers and broadleaves, are commonly found in limestone areas.

Within the CR region stone pine occupies the rocky granitic slopes, which define the basins of the Alberche and Tiétar rivers, between the provinces of Ávila and Madrid. These valleys conform a mountainous area, with an orography of hills and small valleys, with steep slopes defining an abrupt landscape. In this territory, stone pine grows at an altitudinal strip between 600 and 1,000 m occupying more than 30,000 ha. Soils are in general poor, mainly composed by sands (70%-80%), though the main differentiating characteristic is the high presence of granite rocks in surface layers, accounting for 50% of the total soil volume. Concerning climate, CR is a moist Mediterranean region, with average annual rainfall exceeding 700 mm but summer precipitation below 50 mm, and warmer temperatures (average temperature: 14.0 ºC with absence of extreme freezing events). The complex orography of these stands as well as the recurrent frequency of forest fires have oriented their main production towards pine nuts and pastures, resulting in mixed, open forests with multi-aged structure, where overmature great cone producers are kept standing and no thinnings are commonly applied (

In 1996, INIA-CIFOR, in cooperation with the forest services of Valladolid, Ávila and Madrid, installed a net of permanent plots in pure even-aged stands of

At plot installation, diameter at breast height, total height, crown diameter, height to crown base and tree coordinates were measured in all the trees within the plot. In a subsample of trees, total age was determined by extracting cores at stump height with a Pressler increment borer. Plots were inventoried in 2001, 2008 and 2015. Main differences among the studied regions in stand and tree level attributes are related with the overmaturity and lower stocking of the stands in CR (

In the five trees closer to the plot center, mature cones were collected every year by specialized climbers. Cones from each tree were classified as healthy or damaged by pests (larvaes from moth

As an independent data set for validating the model we used the data from the annual series of cone production at forest scale managed by the forest services. Annual cone harvesting rights in public forests are sold every year in public auctions, and at the end of the campaign, enterprises should give a report including total weight of cones collected within the forest. These post-crop data series at forest scale extend from 1962 up to the present in NP (Valladolid province) while in the CR there are available estimates of total cone production covering from 2004 to 2012 campaigns.

As in the previous spatiotemporal model, the proposed response variable was the weight (expressed in kg) of sound cones annually collected in each tree

The observed excess of zeros in the data set largely prevents from using classical statistical approaches. Zero-inflated models (

The binary part is commonly modelled using a logistic function assuming a binomial distribution. Given the continuous nature of the abundance part, we propose to use a log-normal distribution, since it avoids zero-truncation (meaning that zero responses can only occur in the binomial part of the model) and offers greater potential for modelling highly skewed distributions (

Where θ represents the probability for having a non-zero value of wc (i.e., obtaining a non null crop), and μ represents the expected value of the logarithmic transformation of wc, conditional to a non-null crop; _{ }and

Given the huge differences observed in individual tree cone production between NP and CR, the ecological differences observed among the two analyzed regions, and the different length of the analyzed series, we proposed to fit independent ZILN models for each region. However, we attempted to maintain a similar structure of the model in both regions. To do that, we evaluated the same covariates acting at different spatial and temporal scales as potential predictors of cone production.

Tree size: diameter at breast height, section at breast height, total height.

Tree level competition: basal area of trees larger than subject tree, ratio dbh / mean squared diameter, ratio tree section / basal area.

Stand attributes: stocking density (stems/ha), basal area, mean squared diameter, Reineke’s stand density index , dominant height, crown cover factor.

Logarithmic, root and inverse transformation of these variables were also evaluated as potential predictors. Stand age and site index, which were predictors in the previous version of the model, were not more considered as potential predictors.

Rainfall: monthly precipitation, starting from the month of January four years before the year of cone maturation and ending up in October of the year of maturation. Apart from single month precipitation, cumulative precipitations in periods covering up to twelve months were also evaluated.

Temperature: number of freezing days per month.

Lagged response: the possible inhibitory delayed effect three years after a bumper crop observed, was taken into account by using data series at regional scale. Average regional value for cone production per ha one, two, three and four years before the current crop were evaluated as potential predictors. This data were obtained from the provincial series of annual cone production.

Rainfall and temperature monthly series were obtained from the most complete meteorological station located within each region: Valladolid meteorological station (41° 39’ 8’’ N - 4° 43’ 24’’ W, 690 m a.s.l.) in NP, and El Tiemblo station (40°24’56” N - 4°30’02” W, 685 m a.s.l.) for CR. The use of a single reference station per region was preferred, because masting pattern had been found to be regionally synchronized rather than locally (

To ensure the compatibility among annual and periodic predictions, climate variables and the lagged response were standardized by substracting to the observed annual value the mean value for the series and dividing by the standard deviation. By doing this, periodic predictions over an average year can be carried out by fixing a value of zero for all the temporal attributes. Additionally, the standardization allows to give insight on the relative importance of each temporal predictor over the response variable.

Covariate selection and model construction were carried out following a sequential procedure. In an initial phase, two separate models for the occurrence of the event (fruiting /no fruiting) and the abundance of the event (total weight of cones, in logarithmic scale, only considering nonzero observations) were separately fitted. In this preliminary step, models were fitted as generalized linear mixed models, including random effects at plot, tree, year and plot x year effects. Starting from a basic model that only includes the intercept, the empirical Bayesian estimates for random effects at different levels were correlated with the evaluated covariates acting at those scales, i.e., year random effects

Model accuracy was first evaluated over the fitting data set. To do that, marginal predictions (assuming unknown random effects equaling zero) over the response variable, expressed in original weight of cones scale, were computed. Antilogarithmic transformation of the response variable was carried out by applying exponential transformation and then multiplying by the factor,

Thereafter we computed mean error (E), level of significance of the error (p-value), root mean square error (RMSE) and modelling efficiency (MEF) at different scales:

Tree x year: comparing observed and predicted values of the response variable

Tree: comparing the observed and predicted total production at tree level for the whole period.

Plot x year: comparing the observed and predicted annual production at plot level.

Plot: comparing the observed and predicted total production at plot level for the whole period.

Year: comparing observed and predicted annual average production at tree level.

The constructed models for each region were validated against the available data from the series of cone production at forest scale previously described. The model for the Northern Plateau was validated over the annual series 2004 – 2014 from five forests (numbers 35, 40, 44, 39 and 79, covering more than 5500 ha, 10% of the total stone pine area within the region). We validated the model for the Central Range over the annual series 2004-2012 from forest 75, the most important in the region, covering 1000 ha. Upscaling the predictions from tree level to forest level was carried out by using data from the forest management inventory. Sampling plots are circular, with a fixed radius of 15 m, and are systematically located in a grid of 200 m x 200 m within each forest. Collected attributes include tree dbh for all the trees within the plot and tree height and age in a subsample of trees. The constructed model was then applied to each tree within the plot, and plot annual production was computed as the sum of the predicted productions for each tree. Forest annual production was then upscaled by using common sampling estimation procedures (

There is a significant difference in cone production at tree level between both regions (

For the common thirteen-years series (1996-2008), no significant differences between regions are found in 2005, 2006 and 2008 (p-value according to non-parametric Kruskall-Wallis test 0.1442, 0.0563, 0.1660 respectively). Uniquely in 1996 and 1997 average crop in NP was higher than in CR. Constant differences in these values point out to huge and permanent spatial differences in the process of fruiting among regions.

Large interannual variability in cone production is found in the two studied regions, with no significant pattern of synchrony between them (Tau´b kendall: 0.0512, p-value: 0.8072;

After an independent sequential procedure for model fitting, a similar set of covariates was selected for each region, revealing large interregional similarities in the factors ruling cone production (

In relation with temporally explicit covariates, results show that in both regions the most important factor ruling cone production is the amount of rainfall from spring to fall during the year previous to flowering, together with the rainfall during the months just before and after flowering, as well as the rainfall during the last year of maturation. While general similitudes are observed, the exact season do not match exactly between the two regions, probably due to phenological shifts along termal gradients (^{nd} year of maturation in CR. Finally, a negative effect of the cumulative sum of the cone crops two and three years before cone maturation is found in both regions. Taking into account that all the temporally explicit covariates are standardized over the mean and standard deviation values for the climate series 1980 – 2010, further use of the model will require these values (

Accuracy of the fitted models over fitting data set was similar for both regions. Predictions were significantly unbiased at all the analyzed spatial and temporal scales, with relative error at scale of prediction always below 6% (

We compared the constructed model with a new refit of the previous model by

Upscaling the predictions by the model to the forest scale led to unbiased results for the annual cone production of the six analyzed forests over the studied series 2004-2014 (^{–1}. Modelling efficiency at forest scale range from 0.61 to 0.90, except for the forest 40, where due to the exceptional bias in 2005-2006 (predicted crop at forest scale over 205000 kg, while collected crop uniquely reached 66000 kg), MEF is reduced down to 0.23. Root mean square error for the analyzed forests varies from 20 to 95 kg.ha.year^{–1}. The model attains to mimic the observed pattern of interannual variability at forest scale, mainly in differentiating among bumper and non-bumper crops (

The constructed model for the Northern Plateau means an improvement over the previous model (

With respect to the predictors explaining interannual variability the main difference with the previous model is the entrance of the model of the delayed effect occurring two and three years after a bumper crop. This inhibition of flower induction by resource depletion due to ripening crops has been previously described in stone pine (

The evaluation of a more complete set of climate covariates and an enlarged series of years resulted in the entrance within the model of the cumulative sum of the rain between May and November the year before flowering, instead of the separated May-June and October-November precipitations that entered in the previous version. This indicates a continuous influence of water availability during the whole vegetative period the year before flowering, which can directly enhance flowering by affecting bud development and differentiation (which was the main previous theory), but also indirectly by means of an increment on needle length, leaf area index, annual net assimilation and accumulated carbohidrates (

In the case of the Central Range, the constructed model conforms the first approach to modelling cone production in the region based on climate, stand, tree and site attributes. Previous modelling efforts, as the one presented by

As expected, our results confirm that the factors affecting cone production in both regions are similar, giving support to the hypothesis of correlated environmental variables rulling spatial synchrony at large scales, as stated by other authors (

The predictions by the models in both region mimicked the observed patterns of spatial and interannual variability within the fitting data set, accurately differentiating good and bad crop in both regions. Model application lead to unbiased estimates of cone production in all the different spatiotemporal scales analyzed, resulting in goodness of fit statistics similar in both regions. In the case of the NP the new fitted model is able to explain the recent decay in cone production during the campaigns extending from 2012 to 2015. This result supports the hypothesis of climate causes of the reduction, aggravated by the inhibitory effect of the exceptional 2010 cone crop, against the hypothesis focusing on biotic agents like

Upscaling the predictions of the model from the tree to the forest scale permits to obtain unbiased estimates of annual cone production for the forest management units. Application of the models using input data from forest management inventories and updated climate data from regional meteorological stations will allow obtaining reliable estimates of cone production by the end of July of the year of cone maturation, four months before the beginning of cone collection, and by the time where public auctions for cone production are carried out. Given the standardized character of the climate variables, it will be possible to make even one-year advanced predictions of cone production, defining a confidence interval for the prediction based on different scenarios of rainfall during the last months of maturation. All this advanced information is essential for a sound planning of all the activities involving the different stakeholders, from the ownerships to the cone collection enterprises, forest services and pine nut industries.

Although the fitted models requires climate predictors extending up to three-years before cone collection, given the standardized character of these predictors it is possible to make average predictions for a given period by assuming a zero value, and even consider the effect of future climate scenarios by assuming variations on the climate standardized predictors within the proposed range. In this sense, the constructed models will be easily implemented within a stand level simulator, as PINEA2 (https://sites.google.com/site/regeneracionnatural/pinea2), allowing to compare and define optimal silviculture under different scenarios. In addition, the required input dataset for the model is compatible with the data from the Spanish National Forest Inventory, permitting to obtain annual estimates of cone production at regional scale.

The authors thank to the Regional Forest Services of Valladolid, Ávila and Comunidad de Madrid their continuous support in maintaining the experimental trials. The authors also wish to thank Rodrigo Gandía, Luis Finat, David Cubero, Rebeca Martín, Enrique Garriga and Ángel Bachiller for their appreciated help in carrying out this research.