Agricultural and environmental information systems : the integrating role of area samples

This article aims to be a contribution to the improvement of agricultural and environmental off icial statistics. Methods are applied to integrate information from state agency registers regarding crop area with ground data observed in random area samples. To improve the precision of crop area estimates in small areas (municipalities), methods using ground survey and remote sensing are applied. To improve temporal resolution of crop area estimators, methods based on time series analysis are applied. Agro-meteorological models are applied to improve crop yields statistics. A method is shown whereby crop rotation models may be a useful tool to forecast changes in the dynamics of the use of natural resources (soil, water and air) by agriculture and to foresee their environmental impact. Finally, a method to update and disaggregate information from territorial censuses on land uses is applied. These methods and models are illustrated in the framework of an information system belonging to the Spanish Ministry of the Environment and Rural and Marine Affairs. The relative improvement offered by each method is assessed by evaluating the precision gain of the proposed crop area estimates versus those currently used by the aforementioned information system. Additional key words: agrometeorological models, crop rotation models, sampling in time, small area estimation, spatiotemporal econometric models, updating regional censuses and disaggregation of information.


Introduction
Information is key for decision-making, as it allows us to transform uncertainty into risks and to quantify and reduce risks.The greater the quantity and quality of information (precision, opportunity and specificity), the lower the uncertainty and the better the risk conditions in which economic and social actors reach their decisions.
Production of information is a complex task that economic and social actors are not in a position to perform individually as this requires not only data on quality, but also knowledge to select the most relevant data in each applicable case and interpret this data.This is a collective task where public authorities are responsible for data production and private bodies then interpret data in the specific context where the decision is reached, with the help of economists and other social scientists.This traditional pattern of division of reporting tasks is not exact (Wolf et al., 2001) because individual economic actors cooperate with the authorities by supplying data and the authorities interpret the data they produce in a context that is increasingly more global and less specific.
In agriculture, the nature of the information required is very diverse, given the heterogeneity of the agents operating in the sector: farmers, input providers, agrofood industries, marketing agents, politicians and state agencies, among others.Farmers associate risk to factors outside their control which have a large impact on their production costs and the economic results of their concerns, including: weather, perceived and paid prices, decisions taken by competitors (Kuhlmann and Brodersen, 2001) and environmental restrictions (Burt, 2003).They require information on these factors that is concrete and precise to allow them to detect deviations from their production plan versus targets and to implement the necessary strategy changes to turn the situation around.The systems known as Decision Supporting Systems, however, have not been widely used by farmers, perhaps because they prefer to use tactics to overcome obstacles caused by factors outside their control, instead of production strategies that may be optimal but rarely lead to overcoming these obstacles (McCown, 2002).Research and development of new production techniques is also required.Food industries and marketing agents require information on crops and input suppliers require information on consumption of production means.
Society, through its political representatives, demands farming systems that provide food safety and bio-energy, without irreversibly exhausting or polluting natural resources (soil, water and air) and respecting biodiversity.All of the above in a context of climate change.Information required on these systems is basically limited to large macro-economic results (internal offer and demand and the outlook of international markets), while the interaction between the economy and the environment has been relegated to the background.Today, this interaction has moved to the forefront and so has the demand for information to monitor the impact of agricultural systems on the environment, control of waste produced and the design of sustainable agricultural systems which remain productive over time.
This article applies several statistical methods to produce timely, precise, specific and high-resolution spatiotemporal information, in the framework of an information system.The methods applied are illustrated using the Farming and Environmental Information System belonging to the Spanish Ministry of the Environment and Rural and Marine Affairs.This system comprises elements of diverse origins that were designed with a variety of very different specific purposes and this article provides methods to adapt them to the current needs for information.The focus of attention is the bridging role that random area samples can play to transfer the advantages from one element of the system to another, and remedy any deficiencies.This usefulness of random samples is illustrated using the Survey on Surfaces and Crop Yields in Spain (ESYRCE), a key element within the Spanish system.
Surveys, state agency registers and censuses are the basic elements encompassed in an information system.In the Spanish system, we can highlight ESYRCE, a territorial survey originally established to cater for European Union (EU) information demands on member state harvests.In Spain, this information has been Abbreviations used: BLUE (best linear unbiased estimator), BLUP (best linear unbiased predictor), CGMS (crop growth monitoring system), DE (direct estimator), ESYRCE (encuesta sobre superficies y rendimientos de cultivos en España = survey on surfaces and crop yields in Spain), EU (European Union), OLIWIN (agro-meteorological models for the estimation at harvest of olive and vine yield), SAS (statistical analysis software), SAS/IML (interactive matrix language of SAS), SIGA (sistema de información geográfico agrario = geographical information system on farming data), SIGPAC (sistema de información geográfica de parcelas agrícolas = geographic information system to identify agricultural plots), SIMWAT (simulation water balance), UTM (universal transverse mercator), WOFOST (world food studies).
gathered so far by means of subjective processes, on the basis of local expert crop area estimates.Although regulations in the EU allow data to be obtained by subjective procedures, they recommend the preferential use of objective methods.Data obtained by objective methods is directly convertible between EU member states, and this involves a substantial advantage over subjective data.
Given this recommendation from the EU, a random sample of areas was designed (ESYRCE).The sample was selected on the basis of the National Topography Map 1:50,000, using as a sampling unit (segment) an area of 700 × 700 m, inscribed on a 1 × 1 km square of the Universal Transverse Mercator (UTM) projection.This is a systematic sample and it covers the entire national territory, supplying high-resolution spatial data on soil uses and crop yields.Its results are objective and directly comparable and standardised among Spanish Autonomous Communities, EU member states and third countries (Ambrosio and Gallego, 1998).
State agency registers are an important source of data but, however, this data is also subject to unknown measurement errors that must be combined with data from random samples for their validation.This article applies an integration method for data from state agency registers and data from random samples, resulting in integrated data which validates data from this register and is more precise than the data it encompasses.The method is illustrated integrating data from the Spanish system registries with other data from ESYRCE.
In the framework of official information systems, samples are generally designed to obtain precise annual details on a national level and for large regions (Autonomous Communities), meaning that data from the level of small regions, provinces and municipalities do not have the level of precision required for well-informed decisions to be taken.The required level of temporal resolution is also absent: this is the case of ESYRCE, whose results are obtained after the end of the campaign and are frequently not available at the right time to make decisions.This article applies a method to improve the precision of crop area estimates in small areas and another to provide crop area forecasts, by combining f ield data with auxiliary information from other sources.Both methods are illustrated using ESYRCE.
In order to prevent an undesired environmental impact, it is especially relevant to forecast individual land use decisions made by farmers and to assess their environmental impact.This article applies a method to forecast the alternation and rotation of crops using field data observed in area samples.Alternation of crops, combined with auxiliary information on doses of water, fertiliser, herbicides, pesticides, and greenhouse gas emissions allows the assessment of the impact of agricultural systems on the environment.In addition, crop rotation models are useful tools for updating territorial censuses.Territorial censuses, such as the Geographic Information System to identify Agricultural Plots (SIGPAC) or the Geographical Information System on Farming Data (SIGA), provide spatial resolution to the information system but keeping them up-to-date is costly.The way in which rotation models can be used to update these censuses is shown in this article.
Agro-meteorological models are a useful tool for carrying out forecasts on the crop yields depending on weather conditions and this article applies this tool.Area frames are useful to select farmer samples to interview, to gather information that is not directly observable on the ground, and this article applies a method for areas with extensive agriculture, analogous to that used for intensive farming areas.
A computer programme written for the IML procedure of the SAS statistical package has been specifically created for each one of the methods applied in the case of ESYRCE.

Integration of data from different sources
There is frequently data on the same issue from a variety of sources and this article applies an integration method for the data series registered by Spanish agencies and the data series estimated by ESYRCE.The method is illustrated by applying it to the estimated cultivated surface.To achieve data integration, data from each source must have a statistically significant relationship with the real surface.When the source is a random sample, then the data have a statistically significant relationship with the true surface and it is possible to estimate errors on the basis of the sample (Ambrosio et al., 2003).However, when the source is a state agency register, statistical models are required to estimate measurement errors.

The model
Let y t be the (unknown) surface of a certain crop in year t, in a given region (Autonomous Community).
Two series of crop area estimates for {y t ; t = 1,2,…,T} are available.One {y ˆEt ; t = 1,2,…,T} is the Direct Estimator (DE) used in ESYRCE, which is based solely on field data observed on a random area sample: where M is the number of systematic samples in the population, m is the number of systematic samples selected and y it is the surface of the crop in the i th systematic sample in year t.And the other {y ˆAt ; t = 1,2,…,T} is based on a state agency register.The statistical properties of the DE estimator, y ˆEt , are known as these are based on a random sample (Ambrosio et al., 2003).However, the properties of y ˆAt are unknown and must be assessed using statistical models.To this end, the model: is considered, where Eq. [1] is a model of the register error structure, specified as the sum of a fixed component, β 0 + β 1 y t , and a random component, e t .The random component has a zero mean and its variance is σ 2 e .In addition it is also assumed that e t is independent from y t .Eq. [2] describes the statistical properties of y ˆEt : it is an unbiased estimator of y t and the error u t = y ˆEt -y t is random with a zero mean and variance σ 2 u .
Note that the new integrated data, y ˆt, involves a yˆE t correction of an amount that is a function of the deviations between the crop area estimates of the random sample and the register, v t = y ˆAt -β 0 -β 1 y ˆEt .The correction factor, , depends on the degree of precision, σ 2 u , of the random estimator.Integrated estimator y ˆt improves y ˆEt because it is more precise: both are unbiased and the integrated data has a lower variance, V (y ˆt|y t ) ≤ σ 2 u .Estimate y ˆAt contributes to improving the precision of y ˆEt in a g = 1 -V (yˆt|y t ) / σ 2 u proportion.

Estimation and validation
The model defined by Eqs.[1] and [2] has been estimated using data concerning crop surfaces registered by Spanish agencies, using ESYRCE as the random area sample (the length of the series is from 1990 to 2005).Model parameters are estimated using the estimators suggested in Fuller (1987;pp. 13-15), and assuming that σ 2 u is known: and where and Integration requires the relationship between data from the sources not to be spurious, that is, the deviations v t = y ˆAt -β 0 -β 1 y ˆEt have to be stable in the long run: in this case it is said that data series generated by both estimators, y ˆEt and y ˆAt , are co-integrated.If the series are stationary (so that the mean and the variance do not change over time), then they are co-integrated.The Augmented Dicky-Fuller test has been used to verify the stationarity of the series.If one or both series are not stationary (and they are a random walk), the Phillips-Ouliaris test is used to verify co-integration (Maddala and Kim, 1998).

Results
Co-integration of the data series registered by Spanish agencies (the document known as the «Anuario», based on farmers' statements about the acreage of their crops) and the data series of crop area estimates provided by ESYRCE (using the DE estimator), concerning large crop groups -cereals, legumes, root crops, industrial crops, fodder crops, citrus crops, fruit trees, vines and olive trees-has been studied for the crops and for the Spanish Autonomous Communities where the extension of the series available was enough to allow these tests to be performed.
If the data series of crop areas registered in the administrative registers and the series of crop areas provided by ESYRCE are co-integrated, then the BLUE estimator outperforms the DE estimator currently used in ESYRCE, as a result of integrating both series.Table 1 shows the estimates of the error of these two estimators, together with the gain in the precision of crop area estimates due to the BLUE estimator.This gain is achieved without cost increase, since the required data are already available.
Concerning cereal crops in Castile and Leon, Andalusia and Aragon, the «Anuario» cereal area data series are co-integrated with the series of cereal area estimates provided by ESYRCE.In these Regions, the «Anuario» cereal area data series are statistically validated (they are coherent with ground data observed in the area sample) and their integration with ESYRCE's cereal area series reduces the error of the BLUE estimator with respect to that of the DE estimator used in ESYRCE, as Table 1 shows.In Castile-La Mancha and Extremadura over half of the «Anuario» data series concerning cereal area are co-integrated with the series of cereal area estimates provided by ESYRCE and for these estimates, the error of the BLUE estimator is reduced in quantities analogous to the aforementioned data.In the remaining Spanish Autonomous Communities (excluding Galicia, Asturias, Cantabria, the Basque Country and the Canary Islands where cereals are not grown) this is not the case and hence the «Anuario» data series on cereal areas are not statistically validated (they are not coherent with ground data observed in the area sample).
Legume harvesting is centred in the Autonomous Communities of Castile and Leon, Castile-La Mancha and Andalusia, meaning that only these three regions have the necessary information available to analyse cointegration of the series.In these regions, the «Anuario» data series on legume area are co-integrated with the series of legume area estimates provided by ESYRCE, and hence they are statistically validated and the integration of the «Anuario» data series with ESYRCE reduces the error of the BLUE estimator, as Table 1 shows.With reference to root crops, the «Anuario» root area data series are not co-integrated with the series of root area estimates provided by ESYRCE in any of the Spanish Autonomous Communities.The potatoes crop in La Rioja is the exception and its integration with ESYRCE reduces the error of the BLUE estimator.The majority of the «Anuario» data series concerning industrial crops area are co-integrated with the series of industrial crops area estimates provided by ESYRCE in Castile and Leon and its integration reduces the error of the BLUE estimator (see Table 1).In Andalusia and Aragon, this happens in less than half the cases and for these cases (cotton and sunflower) the error of the BLUE estimator can be reduced as shown in Table 1.In the remaining Autonomous Communities this does not happen in any of the cases and this is why the «Anuario» data series concerning industrial crops area are not statistically validated in those Regions.
Six fodder crops have been considered -maize, lucernes, vetch, turnip, beetroot and cabbage, and there is only enough data on the first three crops to perform an analysis in all Autonomous Communities: In Aragon and Castile and Leon, the «Anuario» lucernes area data series are co-integrated with the series of lucernes area estimates provide by ESYRCE and its integration reduces the error of the BLUE estimator as shown in Table 1.In the remaining Regions, the «Anuario» data series are not co-integrated with the series provided by ESYRCE, so the «Anuario» data series concerning fodder crops area in those Regions are not statistically validated.
As far as citrus crops are concerned, there is a close relationship between both series in Valencia.However, there were notable differences in 1998 and 1999 and these deviations have brought about the lack of cointegration.In Andalusia, the «Anuario» orange trees area data series are co-integrated with the orange trees area estimates provided by ESYRCE and the error of the BLUE estimator can be reduced as is shown in Table 1.In other Regions the series are not co-integrated.Most of the data series registered by Spanish agencies concerning fruit trees area are not co-integrated with the series of fruit tree area estimates provided by ESYRCE: in Valencia there are two exceptions, apricot trees and plum trees, and the integration of both series reduces the error of the BLUE estimator from 11.74% and 20.63% to 8.68% and 17.85%, respectively.
The «Anuario» data series on vineyard areas registered in Castile-La Mancha and Valencia are co-integrated with the series of vineyard area estimates provided by ESYRCE and its integration reduces the error of the BLUE estimator as is shown in Table 1.Concerning olive trees, the «Anuario» olive tree area data series are cointegrated with those of olive tree area estimates provided by ESYRCE in Andalusia and Castile-La Mancha and its integration reduces the error of the BLUE estimator as shown in Table 1.More details on these results can be found in Ambrosio et al. (2007).

Estimation in small areas
In official information systems, samples are generally designed to provide direct estimates (based solely on field data), that are robust (not dependent on models) and precise, in large regions.However, direct estimates in small areas (province and/or municipality) based solely on the sample designed for the large area which they are part of are not sufficiently precise in most cases, given that the part of the field sample in each area is usually very small.A further disadvantage is that it is not possible to obtain direct estimates from small areas where no field data are available.
Sometimes, it is possible to obtain the data required in provinces and/or municipalities, on the basis of censuses, government agency data or experts' opinions.However, censuses are costly and they are carried out over a lengthy period and is for this reason that they are not updated; meanwhile, government agency data and experts'opinions, although not subject to sampling errors, are however subject to unknown measurement errors and, as a consequence, their estimates tend to be poor (Jiang and Lahiri, 2006a) and they cannot be validated.
Specific estimation methods are required for small areas, the definition of «small area» being an area where direct appraisal techniques do not provide estimates with the required degree of precision (Rao, 2003).The literature suggests a number of direct appraisal techniques using auxiliary information and using statistical methods to combine auxiliary information with field information (Särndal, 1984).Some of these techniques are model-dependent, as their estimates have good statistical properties but only if the model is correct (they are not robust).In order to reduce the risks of an incorrect specification of the model, mixed appraisal techniques have been suggested, providing indirect estimates based on models and consistent according to design (robust) (You and Rao, 2002;Jiang and Lahiri, 2006b).
This article assesses the techniques used to improve precision of appraisal in small areas, and ESYRCE is used to illustrate their application.The aim is to estimate the total of the surface of a certain type of crop in one of the municipalities, m, in a given region.It is considered that the i-th municipality {i = 1,2,…,m}, comprises N i sampling units (segments).Associated with the j-th segment {j = 1,2,…,N i } of the i-th municipality we find a set of values {y ij ; x _ ij }, where y ij is the real crop surface (unknown, but fixed) and x _ ij is a vector of known values of the auxiliary variables.Remote sensing is the most commonly used auxiliary information source.
In the indirect appraisal strategy that we will follow, the appraisal problem is limited to a problem surrounding the prediction of the values of y ij in the segments not included in the field sample (Royall, 1970;Royall and Herson, 1973).When n i is the size of the field sample in the i-th municipality and {(y ij , x _ ij ); j = 1,2,…,n i } are the data observed in this sample, an estimator of the total would be Y ˆi = n i y ¯i + (N i -n i )Y ˆi*, where: , , and y ˆij* is the prediction of the value of the crop surface in the j*-th sampling unit not included in the field sample.

The model
The y ˆij* prediction is obtained from a statistical model linking field data, {y ij ; j = 1,2,…,n; i = 1,2,…,m}, with auxiliary information, {x _ ij ; j = 1,2,…,N i ; i = 1,2,…,m}.Two types of (mixed) models have been proposed: a disaggregated model on the level of individual sampling units (Battese et al., 1988) and a further aggregated model on the level of small units (Fay and Herriot, 1979).
In this case, the disaggregated model: shall be used.
According to this model it is possible to obtain a measurement, Ey ij = x _ T ij β _ , of the crop surface in a certain segment, {y ij ; j = 1,2,…,N i ; i = 1,2,…,m}, depending on the auxiliary information and a parameter vector, β _ .The error of this measurement, is the effect of numerous causes that are grouped into two large categories, u ij = v i + e ij : the effect of some of them is v i and they affect all segments of a same municipality equally and, in other cases, there is a variation from one segment to another whose effect is e ij .
Model [3] is a way of specifying that the measurement errors, u ij , are correlated.For example, in remote sensing, auxiliary information, x _ ij , is the refraction of sunlight on the Earth's surface, captured by the sensors.This light is dispersed in such a way that the refraction of a pixel is distributed over several adjacent pixels, with the input of an error, v i , that depends on the environmental surroundings -soil and climate-where the measurements are taken and has equal effects on all segments of same surroundings.The specific error component in each segment, e ij , is attributable to the instrument itself (sensor) and the data processing process from the sensor to the image (Labovitz and Masuoka, 1984;Webster et al., 1989;Haining, 1990).Both error components v i and e ij are regarded as independent random variables with zero means and variances σ 2 v and σ 2 e , respectively.The total error, u ij = v i + e ij , with zero mean and variance

Estimation
The BLUE of ) is a diagonal matrix by blocks with: , where I _ (ni) is the identity matrix of order n i and 1 _ (ni) is a column vector (n i × 1) of ones, . By substituting this predictor in Y ˆi = n i y ¯i + .
Several works can be found in the literature (Battese et al., 1988;Ambrosio and Iglesias, 2000) where model [3] is checked using remote sensing data.In particular, it is shown that parameters β _ and σ 2 v are significantly different from zero.
The direct estimator is the mean of f ield data, Information systems

Results
Ambrosio and Iglesias (2000) show that for a case similar to that studied here, the relative efficiency of the indirect estimator proposed vis-à-vis the direct estimator ranges between 2.08 (when the field sample in the small areas is of 17 segments) and 46.37 (when the field sample in the small area is of solely one segment) from one small area to another.In other terms, the direct estimator requires, in the first case, a sample of 2.08 × 17 = 35 segments to reach the same precision as the indirect estimator using a sample of 17 segments and the auxiliary remote sensing information.In the second case, the direct estimator requires a sample of 46 segments to reach the same precision as the indirect estimator using a sample of one segment and the auxiliary remote sensing information.

Crop area forecasts based on models from the series of annual crop area estimates
Very often, initial official information systems do not reach the level of temporal resolution required, as they do not report the changes occurring over a given campaign and the results are obtained after the termination of the campaign, meaning that they are not available at the right time to take decisions.In order to improve the temporal resolution of these systems, we suggest obtaining forecasts from surveys using models on their evolution and trends and correcting results of these models using field data to be observed on several occasions over the course of the campaign under way, according to the timetable scheduled.The method applied is illustrated with ESYRCE.
The estimation of the total annual surface y t of a given crop in a certain region is currently calculated in ESYRCE using solely field data observed in year t.In this article, to improve ESYRCE precision (though at the expense of its robustness), and produce forecasts at the start of each campaign, before field data become available, we also suggest using an estimator from previous years {y ˆt; t = t -1,t -2,…,1}, with the help of a statistical model to transfer information from one year to another.

The model
The field data in each sample of the sequence is grouped into «elementary estimates» (Gurney and Daly, 1965), {y ˆt; t = 1,2,…,T}, using the DE estimator (unbiased) in each period of time.The statistical properties of the series of elementary DE crop area estimates of {y t ; t = 1,2,…,T}, are those specified in model [2] above y ˆt = y t + u i .According to [2], variability of the series of elementary DE estimators is due to two main factors: one is sampling, V (y ˆt|y t ) = σ 2 u , and the other is the natural variability of the series {y t ; t = 1,2,…,T}.The variance (marginal) of {y ˆt; t = 1,2,…,T} is V (yˆt) = V y t + σ 2 u .Variability due to sampling, σ 2 u , depends on the type of design and what follows is considered known and the same as that of the DE estimator currently used in ESYRCE.
In order to model the natural variability of the series {y t ; t = 1,2,…,T} several models have been proposed in the literature.Patterson (1950) deems that the model y t -Y ¯t = ρ(y (t-1) -Y ¯t-1 ) + ε t , where is the population average, y it is the crop surface of the crop in question in the i-th sampling unit and ε i is the perturbation term of the model of zero mean, without correlation with the u t or among themselves.In addition, it is supposed that ρ is known and that σ 2 u is approximately constant over time.Note that no relationship is specif ied between the Y ¯t and the Y ¯t' for t' = t -1, t -2,…,1.
If the Y ¯t evolve over time, it is possible to improve estimates specifying this relationship with a statistical model (Tam, 1986(Tam, , 1987)).Blight and Scott (1973) suggest adding the equation Y ¯t -µ = λ (Y ¯t-1 -µ) + η t to the Patterson model.The effect of including these models in the estimation process was explored by Scott and Smith (1974), using the results of the theory on signal extraction in the face of stationary noises.A comparison of these approaches can be found in Jones (1979).Jones (1980), using model [2] y ˆt = y t + u t , shows how different approaches to the work mentioned above can be unified in an approach based on mixed models (with fixed and random coefficients) in the form, y _ ˆ= X _ β _ + + Z _ y _ + u _, where y _ ˆ, y _ and u _ are vectors (T × 1) of components {y ˆt; t = 1,2,…,T}, {y t ; t = 1,2,…,T} and {u t ; t = 1,2,…,T}, respectively, X _ and Z _ are known matrixes and β _ is a vector of P unknown parameters that are f ixed (f ixed effects).u _ and y _ (random effects) are random vectors independent from one another, Cov(y _ ,u _ ) = 0 _, with averages Eu _ = 0 _ and Ey _ = µ _ , and covariances Var(u _) = R _ and Var(y _ ) = G _, respectively.The vector of estimators y _ ˆ is random with expectation Ey _ ˆ= Ey _ and covariance Var y

Estimation
The aim is to estimate y _ more efficiently than y _ ˆ and to predict y T+1 .An optimum estimator for y _ , which is therefore more eff icient than y _ ˆ, is the Best Linear Unbiased Predictor (BLUP), y Note that y _ ˆ(BLUP) makes use of the sample for period t and the prior estimates {y ˆt; t = t -1,t -2,…,1}.Eq. [2], where the vectorial notation is y _ ˆ= y _ + u _ , is a particular case of mixed model where it is assumed that series {y t ; t = 1,2,…,T} is stationary as far as the mean (constant) is concerned, that is, there are no fixed effects and β _ = 0 _, and Z _ is the identity matrix.In this specific case, y The estimator y _ ˆ(BLUP) requires the specification of the structure of correlations of the series of values to estimate,{y t ; t = 1,2,…,T}, and the series of errors of the ESYRCE estimation, {u t ; t = 1,2,…,T}, that is, the identification of Var(y _ ) = G _ and Var(u _) = R _ .
In order to identify the structure of correlations of {y t ; t = 1,2,…T}, level 1 (stationary) auto-regressive processes and random walks (non-stationary) are considered.The model y t = ρy t-1 + ε t with |ρ| < 1 and Eε t = 0; Cov(ε t , ε t' ) = σ 2 ε , if t = t' and Cov(ε t , ε t' ) = 0 in other cases, defines an auto-regressive process of level 1, AR(1), with the form , whose function is the auto-covariance, C(s) = σ 2 y ρ s , allowing calculation of covariances, Cov[y t , y t' ], depending on the interval t' = t -s, for s = 0,1,2,…,∞ and of .The auto-correlation function is ρ(s) = ρ s , as C(0) = σ 2 y .The model y t = y t-1 +ε t defines a random walk.The matrix of variances and covariances of a random walk is Var y _ = σ 2 ε {min(t,t')}.Note that in a random walk, variance, Vy t = σ 2 ε t, grows with t, while in the level one auto-regressive variance (stationary) it converges to a constant when t increases, .However, the random walk may be transformed in a stationary process by differentiating it once, ∆y t = y t -y t-1 = ε t , in such a way that a zero mean process E∆y t = Eε t = 0 results, with finite variance, V∆y t = σ 2 ε .Given that series {∆y t ; t = 1,2,…,T} is stationary with a structure of correlations Var∆ _ y _ = = ∆ _Var y _ ∆ _ T , where ∆ _ is the matrix (T -1) × T of operators with first differences, whose t-th generic row only includes null elements except those in positions t -1 and t with the values -1 and +1, respectively.
The structure of correlations of the series of estimation errors of the area sample, {u t ; t = 1,2,…,T}, is ; ∀ς ≠ 0 where The matrix R _ is estimated deeming S 2 0ς in the usual way, given the design of the simple.The estimation of G _ , depends on the correlation structure of {y t ; t = 1,2,…,T}, estimated on the basis of estimation series {y ˆt; t = 1,2,…,T}.If the estimation series is stationary (auto-regressive of level 1), then G _ is estimated ρ ˆ= y ˆt y ˆt-1 / y ˆ2 t-1 and σ ˆ2 y ˆ= σ ˆ2 e / (1 -ρ ˆ2),where

Predictions
In order to predict the series of values {y T+h ; h = 1,2,…} on the basis of estimated series {y ˆt(BLUP) ; t ≤ T}, we suggest using the predictor, y ˆT+h = a _ T y _ ˆ(BLUP) , where T+h …C t,T+h …C T,T+h ] T and C t,T+h = Cov(y ˆt,y ˆT+h ) are considered auto-covariance functions with the form C t,t+τ = C(τ).In the case of a random walk, the autocovariance function, is

Results
In addition to producing forecasts at the start of each campaign, before field data become available, the error of the DE estimator currently used in ESYRCE can be reduced using this BLUP.Table 2 shows the reductions achieved by the BLUP concerning cereal area estimates.
Reductions of this same order can be achieved for the error concerning legumes area estimates, industrial crops area estimates, olive tree area estimates, vineyard area estimates and orange tree area estimates.In Ambrosio et al. (2007), the results from applying this model to the series of ESYRCE estimates for the main crops can be found.

Crop rotation models
It is common to use longitudinal or panel samples, where the same sampling unit (segment) is repeatedly observed year after year.This is the case of ESYRCE, supplying information to estimate crop rotation, that is, the distribution of the surface, Y i t-1 l , occupied by a given crop, l, in campaign t -1, between each and every one of the J crops of the alternative in the subsequent campaign, t.Knowing this rotation may be useful, among other reasons, to gain forecasts of the crop alternatives of a campaign depending on the previous campaign, to specify econometric spatial models and as a tool for updating regional censuses and disaggregating this information.We will follow an approach that mixes a model-dependent approach with a direct estimation approach, to get indirect estimates assisted by models which are consistent according to their design, that is, robust.Direct estimation is based on field data observed in one of the three systematic samples making up ESYRCE.

The model
It is assumed that the distribution of the surface Y i t-1 l containing a certain crop l in campaign t -1, between each and every one of the J crops of the alternative in the following campaign is the multinomial, {MN( where p i t lj = 1 and p i t lj represents the proportion of the crop surface l in the i th sample unit during the campaign t -1, that becomes crop j in the following campaign, t.Parameters, [4], are specified depending on the crop alternatives of the previous campaign, {Y it-1l ; l = 1,2,…,J}.

Estimation and validation
Regression coefficients, {β j' l j ; j' = 1,2,…,J} are considered homogeneous between sampling units and these are estimated over time following an approach of maximum generalised crossed entropy (Golan et al., 1996).Forecasts are calculated for the immediately subsequent campaign, t = T + 1.The forecast of crop surface j in the i th sampling unit of the sample in cam- , where p ˆiT+1 lj is the estimate of p i T+1 lj resulting from the substitution in [4] of coefficients β j' l j for their estimates β ˆj' l j .The forecast (1 -p i T+1 lj )x _ i T+1 ] T , where x _ i T+1 is the (1 × J) vector of values of auxiliary variables {Y i T j' ; j' = 1,2,…,J}, where β _ ˆlj is a vector whose components are {β j' l j ; j' = 1,2,…,J} and

Crops area forecasts
We will illustrate use of model [4] to gain forecasts of the surfaces with crops on the Regional level.Obtaining these forecasts from the area sample is proposed, using forecasts {Y ˆiT+1 j ; j = 1,2,…,J; i T+1 = 1,2,…,n T } instead of values Y i T+1 j that shall be observed at the end of the campaign T + 1 but are not observable in the current time T. In ESYRCE, the estimator used is Y ˆj,T+1 = Y ˆkT+1 j , where M is the number of systematic samples in the population and m is the number of systematic samples selected.Y ˆkT+1 j = Y ˆiT+1,k j is the sum of forecastsY ˆiT+1 ,k j of crop surface j in the segments of the k th systematic sample, comprising M k segments.
At the end of the campaign, direct observation is made of the value of Y i T+1,k j , that was necessary to predict during the campaign using Y ˆiT+1,k j for all k except one, resulting in Y k T+1, j = Y i T+1,k j and the estimator is reduced to the usual ESYRCE estimator, Y ˆi,T+1 Y kT+1 j and also its variance.

Results
The estimation procedure for parameters defined in [4] has been checked using ESYRCE data and the results were satisfactory.Validation has been performed on two levels: one, more disaggregated, is the segment or sampling unit; the other, more aggregated, is the study zone level.In the first case, a comparison has been made with predictions p ˆit lj of the proportion of a certain crop, l = 1,2,…,J, becoming another crop j = 1,2,…,J in the rotation of a year t -1 to the following year t (t = = 1,2,…,T) with the data observed, f i t lj , in each unit of the sample, i t = 1,2,…,n t ,: the linear correlation coefficient for prediction p ˆit lj with observed data f i t lj is above 0.99 for cotton, olive groves and soft wheat; 0.88 for sunflower and 0.79 for durum wheat.In the second case, the sample averages, p ˆlj = p ˆitlj and f ¯lj = Y itlj for l = 1,2,…,J y j = 1,2,…,J have been compared.Figure 1 shows both averages (f ¯lj in light grey and p ˆlj in grey) when l is cotton in irrigation and j is the crop indicated in x-coordinates.Figure 2 shows both averages when j is durum wheat and l is the crop indicated in the x-coordinates.
In addition, a comparison has been made in each year showing the approximation of forecasts, Y ˆjt Y ˆit j , to data observed, Y ¯jt Y itj .In Figure 3 it is shown that the approximation, Y ˆjt ∼ _ Y ¯jt , is verified.However, there is a forecast error increasing the estimation error based solely on the field data.In our area of study, we estimate that the increase of the standard deviation of the estimator based on the forecast, visà-vis the deviation based on data observed at the end of the campaign is 17.72% for irrigation cotton (18.64% table olive in irrigation, 27.33% for sunflower).

Other applications of crop rotation models
The crop rotation model defined by [4] is a useful tool for other applications, including econometric spatial models on soil usage dynamics and for updating territorial censuses.In order to prevent any undesired environmental impact, it is especially relevant to forecast individual decisions made by farmers and to assess their environmental impact.As shown by Ambrosio et al. (2008), a crop rotation model is a basis to specify econometric spatial models as a tool to monitor and control the use of natural resources (soil, water and air) by agriculture as well as synthetic waste products used in agriculture, such as pesticides and chemical fertilisers.
Crop rotation models are also useful for updating territorial censuses and the disaggregation of infor-   [4] in each iteration to rotate the crop alternative on the basis of the initial {Y rj 0 ; j = 1,2,…,J}.
Note that Y ˆrj t = Y ˆrj' t-1 ( p ˆj'j t ) and pˆj 'j t = 1 so that Y ˆrj t = Y ˆrj' t-1 = Y ˆrj' 0 = Y r .An update of the territorial census is obtained merely by aggregating {Y ˆrj t ; j = 1,2,…,J} in the usage categories considered in the census.In the event that coefficients β j' t j were heterogeneous among sampling units and time, homogeneous zone types and/or time intervals may be established.

Agro-meteorological models
Agro-meteorological models are a useful tool to carry out forecasts on the crop yields depending on weather conditions, and the EU has developed two of such models: one (CGMS) for annual crops and another (OLIWIN) for multi-annual crops.These are computing systems delivering indicators on crop yield with a high spatiotemporal resolution (on the plot level and every 10 days over the campaign) on the basis of environmental data (a ground map and a geo-referenced weather database) and a crop model.
In CGMS (Supit et al., 1994;Van Raaij and van der Wal, 1994;Van der Wal, 1994) the WOFOST crop model is used to calculate yield indicators, such as biomass generated by the crops, dry matter in reserve organs and leaf area indices, as well as the vegetative development level of the crop, including crop water needs (Hijmans et al., 1994).In OLIWIN (European Commission, 1997), the SIMWAT model is used to calculate the water balance.Other elements of the Spanish system also produce vegetative crop development indicators based on vegetation indices calculated on the basis of satellite images (remote sensing).These indicators can be useful for crop yields forecasts and, likewise, indicators calculated on the basis of CGMS and OLIWIN may be useful for the purposes pursued by those other elements.

The model
Crop yields forecasts are gained from statistical models relating the crop yield observed in the field at the end of the campaign with the indicators calculated on the basis of CGMS and OLIWIN over the campaign.These are indirect estimates resulting from the combination of field information with auxiliary information using the statistical model [3], with slight scale and notation changes.This model is illustrated using the crop yields included in ESYRCE on the plot level, grouping data into UTM blocks of 10 × 10 km.
It is assumed that at moment t -τ of the campaign under way, it is possible to obtain a measurement Ey ijt = x _ T ij,t-τ β _ of the yield of a certain crop in plot j of block i at the end of campaign t, depending on auxiliary information x _ ij,t-τ (indicators of the agro-meteorological model in t -τ) and a parameters vector, β _ .The error of this measurement, u ijt = y ijt -x _ T ij,t-τ β _ , is its deviation regarding the true value, y ijt , and it is the result of numerous causes grouped under two large categories, u ijt = v it + e ijt : some of which equally affect all plots within a same block i within a same campaign t, v it , and others varying from one plot to another, e ijt .This model is a way of specifying that the measurement errors are correlated.For example, in the calculation of the indicators x _ ij,t-τ ground maps are used where the same physical and chemical properties are assigned to the ground of a same minimal mappable area, inserting an error that affects all plots to the same degree with all or part of its land on said minimum unit.
biomass estimated via CGMS in the first 10 days of June was 7.39%.For unirrigated wheat the error of the prediction for that same 10-day period was 13.93%.The standard error of irrigated sunflower yield forecasts based on potential biomass estimated via CGMS in the first 10 days of August was 3.56%.For unirrigated sunflower, the error of the prediction for that same 10day period was 16.41%.The typical error of irrigated beetroot yield forecasts based on potential biomass estimated via CGMS in the second 10-day period of November was 5.75%.The typical error for yield predictions of unirrigated olive groves for oil production carried out in September was 3.85%.

Selection of farmer samples on the basis of an area sampling frame
A large part of the information required is not directly observable on the ground and it must be collected by means of direct interviews with the farmers.Generally speaking, samples of farmers to be interviewed are selected on the basis of a list sampling frame (directory of farms).Keeping such a list up-to-date is costly, but this is not the case of an area sampling frame.Thus, the selection of samples on the basis of an area sampling frame offers other advantages among which we can highlight the reduction of coverage biases due to incomplete sampling frames (González-Villalobos et al., 1996) and in this article the selection of the sample of farmers on the basis of an area sampling frame is suggested.The idea is to identify farmers farming lands within the area sample segments and choose a sample of these farmers.
The Department for Agriculture and Fishing of the Andalusian Government has developed an agricultural survey design procedure based on an area sample for the estimation of surfaces, yields and other structural and technical-economic features of vegetable crops and intensive crops as a whole that is analogous to the system applied in this article.Intensive crops, in particular vegetables, are grown successively over the year on a same plot, depending on variable market circumstances, and this means it is necessary to interview farmers.A step-by-step description -construction of the area sampling frame, selection of the segments sample and the farmers sample, collection of information and calculation of estimates and the estimation error-, together with the results of the samples designed with this methodology for the structural and technical-economics features of intensive crops in Andalusia and for the estimation of production costs and their profitability threshold may be found in Ambrosio et al. (1999Ambrosio et al. ( , 2006)).

Discussion
In order to make decisions, economic actors require information which is timely, precise, specific and which has a high spatiotemporal resolution.Society, via its political representatives, demands information to follow the impact of agricultural systems on the environment, to control waste produced and to design sustainable agricultural systems, which stay productive over time.
Production of this information is a complex task that public authorities are taking care of.This article applies statistical methods to improve spatial and temporal resolution of official agricultural and environmental information, combining field data with auxiliary information from other sources, including administrative registers and remote sensing.The methods applied are checked, validated and illustrated using the agricultural and environmental information system of the Spanish Ministry of the Environment and Rural and Marine Affairs.
A method is applied to integrate data from registries of the authorities with data from random samples, which allows both to the validation of crop area data from registries and the improvement of precision of crop area estimates without cost increase, since the required data are available.This improvement involves a reduction of the crop area estimator error, a reduction which varies from one crop to another, ranging from 0.35 (olive groves for the production of oil) to 12.08 (legumes) percentage points (see Table 1).
A specific method is applied to improve precision of crop area estimates in small areas (municipalities) with the aid of satellite images.The relative precision gain offered by the indirect estimator proposed vis-àvis the direct estimator currently used ranges between 2.08 and 46.37 from one small area to another: this means that the typical error of the crop area indirect estimator is between 31% and 85% lower than the typical error of the direct estimator currently used.
In order to improve the temporal resolution of the information system, a method is applied, based on the analysis of the temporal series, allowing result forecasts to be issued at the beginning of each farming campaign.In addition, this last method allows a reduction of the error of the crop area estimator, without cost increase: for cereal acreage, the reduction varies (see Table 2) from one Spanish Autonomous Community to another, with values ranging from 1.10% (Aragon) to 2.50% (Extremadura).In the case of other crops, the reductions have similar magnitudes.
It is shown that agro-meteorological models are a useful tool to carry out forecasts on the crop yields depending on weather conditions and that the forecast error ranges from 3.85% for unirrigated olive groves for oil production to 16.41% for unirrigated sunflower.Crop rotation models are estimated and validated using panel area samples and the results are satisfactory as can be seen on Figures 1, 2 and 3. Various uses for crop rotation models are provided, among others, applications for the update and disaggregation of information in territorial censuses (SIGPAC) and the development of spatiotemporal econometric models.These last models are especially useful to monitor and control the use of natural resources (soil, water and air) by agriculture and synthetic product waste used in agriculture, such as pesticides and chemical fertilisers.

Table 1 .
Gain of precision of the BLUE crops area estimator with respect to the direct estimator (DE) crops area estimator currently used in ESYRCE

Table 2 .
Ambrosio et al. / Span J Agric Res (2009) 7(4), 957-973 Gain of cereal area BLUP estimator with respect to the DE estimator currently used in ESYRCE mation.The degree to which territorial censuses are updated depends on the dynamics of soil uses and this dynamic may be reflected by using model [4] on crop rotation.Given a facility r of the area census Y r , and given its disaggregation in alternative crops, {Y rj t ; j = 1,2,…,J}, where Y rj t is the part of Y r occupied by the crop j at this time, t, and Y r = Y rj t .An estimate of Y rj t is gained by iterating Y ˆrj t = Y ˆrj' t-1 pˆj 'j t and using model