Forecast of frost days based on monthly temperatures

Although frost can cause considerable crop damage, and practices have been developed to mitigate forecasted frost, frost forecasting technologies have not changed for years. This paper reports on a new method based on successive application of two models to forecast the number of monthly frost days for several Community of Madrid (Spain) meteorological stations. The first is an autoregressive integrated moving average (ARIMA) stochastic model that forecasts minimum monthly absolute temperature (tmin) and average monthly minimum temperature (μt) following Box and Jenkins methodology. The second model relates monthly temperatures (tmin, μt) to the minimum daily temperature distribution during one month. Three ARIMA models were identified. They present the same seasonal behaviour (integrated moving average model) and different non-seasonal part: autoregressive model (Model 1), integrated moving average model (Model 2) and autoregressive and moving average model (Model 3). The results indicate that minimum daily temperature (tdmin) for the meteorological stations studied followed a normal distribution each month with a very similar standard deviation through out the years. This standard deviation obtained for each station and each month could be used as a risk index for cold months. The application of Model 1 to predict minimum monthly temperatures produced the best frost days forecast. This procedure provides a tool for crop managers and crop insurance companies to assess the risk of frost frequency and intensity, so that they can take steps to mitigate frost damage and estimate the damage that frost would cause. Additional key words: ARIMA, frost days, mean square error, time series.


Introduction
Crop establishment and development are crucial because of the investment of time and resources involved.Failure of establishment is mainly due to the influence of climatic factors that cannot be controlled (Penning and Van Laar, 1982;Hutchinson, 1991).
Unfortunately, practices to mitigate potential frost damage generally increase crop costs (Hansen et al., 1999).Since freezing temperatures restricting the length of the growing season are responsible for reductions in yield and quality of agricultural crops (Harker, 2002;Faubion, 2003), then minimum temperatures become critical.
Many studies have used monthly temperature, rain, radiation, etc. data to develop models that simulate agro-climatic scenarios (Kuehl et al., 1976;Andersen et al., 2001).Temperature is one of the most important variables in these simulations.Studies with regression functions by Cao and Moss (1989), Jamieson et al. (1995) and Landau et al. (2000), among others, highlight the relevance of temperature in crop growth.Several temperature models have been developed in order to simulate crop response to daily maximums and minimums of one specific region (Jamieson et al., 1995), or with lower accuracy, response to monthly temperatures (Nonhebel, 1993;Castellanos, 1997) or annual oscillations (Fernández, 1992).
Stochastic simulation of weather data to forecast crop yield has become of great importance (Bannayan and Crout, 1999).Dionne et al. (2002) describe crop damage, due to frost, related to crop phenological phase and low temperature tolerance.Knowledge of how temperatures affect crops is basic to selecting and managing crops to avoid risks from extreme temperatures that would seriously reduce yield and market offer (Yang and Chen, 1989;Zhang et al., 1991;Harker, 2002).However, there are very few studies in a given area modeling temperature in relation to frost days as a method to prevent crop damage (Staggenborg et al., 1999).
Air temperature, a temporal series, can be modelled using various techniques, among them the autoregressive integrated moving average (ARIMA) models (McMichael and Hunter, 1972;Kantz and Schreiber, 1997;Montgomery and Zarnowitz, 1998).The aim of this modelling approach is to express current time series values as a linear function of past time series values (the autoregressive component) and current and lagged values of a white noise process (moving average compo-nent) (Box and Jenkins, 1970).In other words, the aim of the ARIMA models is to separate the observed elements into two components: the first, related to the organized part (including tendency, seasonality, cycles); and the second, the random residuals or white noise (García- Barrón and Pita, 2004).For example, Persaud and Chang (1985) used this type of model to relate maximum and minimum daily temperatures with solar radiation.
ARIMA modelling is applicable to stationary series time.The mean, variance and autocorrelation of a stationary series are independent of time; otherwise a series is non-stationary.A non-stationary series may be transformed into a stationary series by differentiating consecutive values (W i = Y i -Y i-1 ) and/or by differentiating consecutive seasonal values if the series is seasonal ARIMA models could be (p,d,q), or (p,d,q) (P,D,Q) s if the series is seasonal, where p and P are the orders of non-seasonal and seasonal autoregressive parameters, q and Q are the orders of non-seasonal and seasonal moving average parameters, and d and D are the numbers of regular and seasonal differences required (as explained above), respectively.Models constructed in this way have been widely used in economics, engineering, etc to explain the time structure of each series and predict its change.The theoretical basis of the proposed models, including required conditions, equations and significance tests, will be explained in the next section.
The aim of this study is to apply these concepts to estimate monthly frost days (FD) based on seasonal ARIMA model forecasted minimum monthly (absolute, t min and average, µ t ) temperatures.This method allows us to estimate frost risk during months where crops could be damaged due to low temperatures.In Comunidad de Madrid this scenario is observed from February to May and from September to December.In addition, no daily temperature series, only monthly minimum series, easier to obtain from meteorological stations, are required.

Data
Meteorological data, including average monthly minimum temperature (µ t ), minimum monthly absolute temperature (t min ) and number of frost days monthly (FD) corresponding to the study years  were obtained from twelve meteorogical stations in Comunidad de Madrid (Fig. 1) at 3º-4º longitude, 40º-41º latitude and 523-1000 m altitude (Table 1).Data missing from the temperature time series at each meteorological station were filled in by linear regression between the monthly values from neighbouring stations (Guerra-Gómez, 1985), checking the homoscedasticity of the series (Buendía, 1985;Sobrino et al., 2003) previously.
The National Meteorological Institute (INM) and the Regional Meteorological Center of Retiro provided, for each station, the minimum daily temperature in Celsius (t d min ).From this the following were calculated: absolute minimum monthly temperature (t min ), average minimum monthly temperature (µ t ) and number of days monthly with <0ºC (FD).
Data for the last 12 months, corresponding to year 2003, were selected to verify the forecasted set of FD, t min and µ t .

Modelling temperature series
Monthly temperature series has a certain random nature, with an evident seasonal component.Fig. 2 shows the actual data used in the model-building process, and illustrates the high degree of non-linearity and seasonality in time.
The univariant analysis of time series developed by Box and Jenkins (1970) (B-J method) was applied to all the meteorological stations with data from 1950 till 2003.Before starting with this methodology it is essential to obtain stationary series (Brockwell and Davis, 1987).Normally the differences of the original series remove the non-stationary features of a time series, such as the time-varying mean and variance.For most economic time series, the values of d and D (mentioned in the introduction) are 0 or 1.Preliminary data analysis found that this is also the case for all time series considered in this work.To determinate the values of D and d empirically, tests of Franses (1991) and Beaulieu and Miron (1993) were applied.
The B-J method consists essentially of three steps: a) identification of possible models, b) model parameter estimation and c) comparison of forecast versus data.Each of these steps will be explained below (for more detail see Bowerman and O' Connell, 1987;and Brockwell and Davis, 1987).

Identification of possible models
At this stage we also need to decide how many autoregressive (p) and moving average (q) parameters  The general form of the seasonal ARIMA model (Box and Jenkins, 1970) with periodicity s (s=12 for monthly data in our case) can be written as: [4] where δ is white noise and a i is white noise referred to that month.B is the backward shift operator, ∆ d = (1-B)d and ∆ D 12 = (1-B12)D are, respectively, the operators for the d th -order monthly difference and the D thorder annual difference.For instance, for D = 1, d = 1: The rest are polynomial terms that involve several unknown parameters to be estimated (φ 1 …φ p , θ 1 …θ q , Φ 1 …Φ P , Θ 1 …Θ Q ).The first two polynomials in Eq. [4] represent, respectively the AR (autorregresive) and MA (moving average) components.The latter two are the seasonal AR and MA polynomials respectively.These components are used to model the cyclical behaviour of a time series around the time-invariant mean.

Estimation of model parameters
Estimation of model parameters obtained in the previous section (φ 1 …φ p , θ 1 …θ q , Φ 1 …Φ P , Θ 1 …Θ Q ) is based on the squared unconditioned minimums (Harvey, 1990).Later, model selection is based on parameters and residual tests such as standard error and t-Student test.Finally, the residual series, the difference between series estimates and observable values was analyzed to check that it was indeed a random series (Harvey and Pierse, 1984), that is, with a null average and normally distributed.
The model selected will be the one that minimizes the mean -square error (MSEt) in the series estimation (Uriel, 1995) defined as: are necessary to yield an effective but still parsimonious model of the process (parsimonious means that it has the fewest parameters and greatest number of degrees of freedom among all models that fit the data).In practice, the numbers of the p or q parameters very rarely need to be greater than 2. The identification step is based on the behaviour of two functions: the simple autocorrelation function, ACF, and the partial autocorrelation function, ACFP (Aguirre, 1994).
ACF gives the correlation strength between two values of the series separated by k values [lag (k)], including the intermediate values (Box and Jenkins, 1976).ACF values, between +1 and -1, are calculated as: [1] N is the number of data in the series, is the variable value at a certain month, is the series average, and its estimator is: [2] ACFP gives the correlation strength between two values of the series separated by k values [lag (k)], but not the intermediate ones.It is estimated as (Bras and Rodriguez-Iturbe, 1993): [3] Normally a significant interval, 1.92 or 1.96 times the standard error, is calculated for both functions to evaluate if the values obtained are significant (Hamilton, 1994).
where: t ij is the estimated monthly temperature, t ij is the observed monthly temperature, i = month number, j = year number, N = number of data.

Comparison of forecast with data
Once the simulation is finished, a model has to be selected from the possible ones based on accuracy according to B-J methodology.
The forecast for the last year ( 2003) with the selected model for each series based on data from 1950 to 2002 were made, with a confidence interval of 95%.The error estimated in this forecast is calculated based on Eq. [7] restricted to year 2003.
When forecast values are within a certain confidence limit of the real data, the models can be considered adequate to forecast the series for the following year (Bowerman and O'Connell, 1987).

Forecast of temperature series
The estimation process is performed on transformed (differenced) data; before the forecasts are generated, the series needs to be integrated (integration is the inverse of differencing) so that the forecasts are expressed in values compatible with the input data.This automatic integration feature is represented by the letter I in the name of the methodology ARIMA.
At this step, the time series (from 1950 till 2003) is used to adjust model parameters.Once finished, the model is used to forecast the following year.

Estimation of number of frost days
A normal probability plot was constructed for each month and each station to check that the underlying probability distribution of minimum daily temperature can be assumed to be normal (Eq. [8]). [8] The essence of such a plot is that points plotted from a normal distribution fall close to a straight line (Devore, 1995).From these linear regressions the mean and standard deviation can be calculated.
Once that normality was checked, we can assume that the absolute minimum monthly temperature (t min ) will be equal to the average minimum monthly temperature (µ t ) minus a coefficient (α) related through the typical deviation (σ t ).That is (Tarquis et al., 1993): [9] Applying Eq. [9], the coefficient was estimated for each station and each month.
Estimated FD, the probability of a minimum daily temperature lower than 0ºC (Diehl et al., 1982), depends on the parameters of the normal distribution, µ t and σ t .

Modelling temperature series
Once that all series have been transformed into stationary series, ACF and ACFP graph were studied (see Fig. 3).There were three possible models for the nonseasonal part and only one model for the seasonal part, depending on the station and temperature analyzed.These are specific cases of the general model shown in Eq. [4].
-Model 1: autoregressive model of order one in the non-seasonal part, and integrated moving average model of order one for the seasonal part.In B-J methodology this would be: (1,0,0),(0,1,1) 12 [11] -Model 2: integrated moving average model of order one in the non-seasonal part, and integrated moving average model of order one for the seasonal part, (0,1,1),(0,1,1) 12 : [12] -Model 3: autoregressive and moving average model of order one in the non-seasonal part, and integrated moving average model of order one for the seasonal part, (1,0,1),(0,1,1) 12 : [13] For both series, and , Model 1 is the most frequently fitted (Table 2).Model 2 µ t time series only fitted two stations (m109 and m342), a water reservoir located in mountain area of Madrid and a hydroelectric power station in a valley, respectively.This model has been fitted for more unstable series than those fitted with Model 1, in which it was not necessary to differentiate the non-seasonal part.These two stations frequently experience temperature inversions due to cold air accumulation in the valleys.The t min series fitted two stations (m109 and m195), a water reservoir (m109) and the city centre surrounded by forest and vegetation (m195), a differentiating it from other stations in the same area.Finally, both time series fitted the station located at Barajas airport, with its higher instability due to winds and turbulence created by aeroplanes.
Coefficient values of autoregressive Model 1 with an average of 0.4 indicate a low correlation between variable value (Tables 3 and 4).Moving average Model 2 indicates that the possible tendency that could exist is due to white noise.Coefficient of an average of 0.86 (Tables 3 and 4) indicate considerable influence of preceding and present noise.
The seasonal part is always an integrated moving average model that yields a random and very weakly auto-correlated series type.Coefficients of this part, of an average of 0.90 (Tables 3 and 4), indicate considerable influence of white noise between consecutive years.

Forecast of temperature series
Forecast temperatures for the twelve stations were well adjusted.An average forecast was calculated for absolute temperatures and minimum averages for the last year according to the chosen confidence interval (95%).Absolute minimum temperature average error was 1.3ºC, with a standard deviation (STD) error of 1.03.Minimum average temperature errors averaged of 1.1ºC, with STD of 0.64.It is observed in general that, for the
twelve stations, the average minimum temperatures adjust better than the absolute minimums (Figs. 4 and 5).
As an example, some data with actual temperatures for three stations are shown in Figs. 4 and 5, each one with a different model.In these Figures the good adjustment of estimated to actual temperatures can be observed.In all months, average forecasted temperatures are not outside the limits of the 95% confidence interval.For m109, as well as for the rest of the stations modelled by Model 2, there is not a significant difference between the model forecast and the monthly average of the series, confirming that this model only reflects the influence of white noise.

Estimation of number of frost days
Estimation of number of frost days monthly (FD) for two crop periods with highest risk per month and per station from the years studied is shown in Table 5.
December is the month with the highest FD at all stations, followed by February and November.From February to May, stations m109 and m169 show the highest accumulated FD value, while from September to December station m100E does so, in agreement with their geographic situation (Fig. 1).a Coefficients correspond to the following models: (1,0,0)(0,1,1) 12 (0,1,1)(0,1,1) 12 (1,0,1)(0,1,1) 12 The same study can be carried out on the STD of FD (see Table 6).One of the first results is that FD standard values follow the same pattern explained earlier for each FD value.Therefore, the month with the highest variation of FD is December, followed by February and then March and November with a similar STD.
The r 2 values obtained from the probability plot analysis (Fig. 6) are greater than 89% for all cases, confirming that a normal distribution can be used to describe daily minimum temperatures.The range of estimated a values varies from 4.2 to 0.204 (Table 7).In general, comparing Tables 5 and 6, an inverse relation between a and the average number of monthly frost days can be seen.The lack of any clear relation between forecasted coefficient errors and location of stations is an indication of the complexity due to several simultane-ous influencing factors (water reservoirs, urban centre, mountain area, etc).
The number of monthly frost days is calculated using forecasted temperatures (t min and µ t ) and estimated σ t and α, as explained previously.
Some of the forecasted frost days are shown in Fig. 7, where for each month studied the monthly average and standard deviation of FD are also presented.For the three stations of Fig. 7, the highest error is in February and November, as in the rest of the stations analyzed.
To evaluate the adjustment of forecasted to actual data, a coefficient error is calculated to compare the difference between both and the monthly standard deviation found (Table 8).We can observe that almost all errors are smaller than 0.5 STD.Only twelve values are larger than STD, mainly in February and November.

Forecast of frost days 515 Figure 1 .Figure 2 .
Figure 1.Map of Spain, with enlarged Community of Madrid.The twelve meteorological stations selected are indicated by a filled circle.The urban area in the central part and the mountain area in the north-west, are bounded by a line.

Figure 5 .Figure 6 .
Figure 5. Absolute minimum monthly temperature series in 2003: original data represented by an empty square () and forecasted, represented by an empty circle (o) for stations m196, m109, m129.Error bars correspond to the selected confidence interval (95%) in the forecast.

Table 1 .
Location of the 12 meteorological stations used in this study (longitude, latitude and altitude)

Table 2 .
ARIMA models fitted for minimum temperature series per station: monthly absolute (t min ) minimum and monthly average minimum (µ t )

Table 3 .
Coefficient values of models obtained per station for monthly average minimum temperature (µ t )

Table 4 .
Coefficient values of the models obtained per station for monthly absolute minimum temperature (t min ) a Coefficients: see Table3.

Table 6 .
Standard deviation (STD) values of the number of monthly frost days (FD)

Table 7 .
α coefficient values estimated per month and per meteorological station

Table 5 .
Estimation of number of monthly frost days per station at various periods: monthly, from February to May (total I), from September to December (total II) and sum of the months shown (total I+II)