Typology , classification and characterization of farms for agricultural production planning

Agricultural activity in Galicia, North West Spain, is carried out on farms that are characterized by a diversity of land use and production models, a variety of farm sizes, and considerable geographical dispersion. Any attempt of Agricultural Production Planning aimed at characterizing production models requires a method of analysing data and obtaining technicaleconomic results from farms in the different areas. Models based on average statistical data are limited because they represent farms that do not exist in reality. This study develops a methodology to characterize and group dairy farms into different types according to the following basic variables: land use, size classes and production systems. The information used in this study was microdata from the 1999 Census of Agriculture. The methodology developed was also applied to microdata from the 1989 Census of Agriculture, thus obtaining significant information about the evolution of agricultural activity. The tools used in the analysis were Microsoft Access and Excel, and an application that was developed using Microsoft Visual Basic. The methodology presented can be used to analyse the evolution of the sector or to model future trends. Additional key words: agricultural census, dairy farm types, Galicia, rural development.

methods have been developed to allocate agricultural uses to land areas. Since then, different research lines have been followed (Rossiter, 1996). There are two main lines of research that differ in the unit of analysis. In the first line, the land is analyzed and the analytical results are extrapolated to farms. In this approach, the first methods used were multi-criteria evaluations based on mathematical programming (Voogd, 1983). Later, cellular automata (Parker et al., 2003) and heuristic models were used because of lower computing costs and the versatility of the solutions from such models (Nalle et al., 2002;Boyland et al., 2004). Other authors combined the mathematical basis with social participation (De Wit and Van Keulen, 1988;Leitner et al., 2002;Snyder, 2003). In the second line of research, the farm is taken as the unit of analysis (Loftsgard and Heady, 1959) and the results are used to determine crop areas allocated to each species (Duloy and Norton, 1983;Hwang et al., 1994). In this approach, the results can be applied to a whole territory (Glenj and Tipper, 2001) or used for strategic land use planning (Carsjens and Van Der Knaap, 2002). Given that the previous methods were developed to model the behaviour of single farms, it is necessary to determine the representativeness and characteristics of farms for a given territory or sector (Thenail and Baudry, 2004). The models used to classify farms simplify into three approaches (Kostov and McErlean, 2006). The first is to choose an average farm and assume that the other farms are linearly related to the characteristics of the chosen farm. In the second approach, more than one farm is required to adequately represent the farm population (and to minimize variability). Usually, multivariate statistical classification techniques are used to conduct such a selection, mainly hierarchical clustering (Everitt, 1993). The difficulty with this type of classification lies in deciding the number of clusters and the representativeness of each cluster. In the last few years, models have been developed that solve the problem of the number of groups, such as the mixture of distributions model (MDM) (Kostov and McErlean, 2006). A third approach consists in determining the evolution of farms from official statistical indicators, mainly based on economic criteria. Reidsma et al. (2006) used data from the European Farm Accounting Data Network (FADN) to model the behaviour of European agriculture. At the regional level, the farms present in the network seem to have both geographical and economic representativeness (Judez and Chaya, 1999). Chatelier et al. (2000) proposed a system for reclassification of dairy farms based on the classes proposed by the FADN. Such a reclassification system conforms better to the variety of possible production processes and to future sustainability.
With a view to grouping relatively homogeneous farms into disjoint classes, the European Union (EU) uses two essential farm characteristics for classifying agricultural holdings: type of farming (TF) and economic size. Both the TF and the economic size are determined on the basis of standard gross margin (SGM). Economic size is determined by the total SGM of farms; TFs are defined in terms of the relative importance of the different enterprises on the farm, measured as a proportion of each enterprise's SGM to the farms' total SGM (RICA, 1988). The economic size of farms is expressed in terms of economic size units (ESU). One ESU corresponds to €1,200 of farm SGM.
A particular farm is assigned to a particular TF when at least 2/3 of farm's total SGM is contributed by that TF. Each TF is made up of four levels of disaggregation: the first level is represented by one digit and corresponds to TF; the second level is represented by two digits and corresponds to principal TF, the third level is represented by three digits and corresponds to a particular TF, and the fourth level is represented by four digits and corresponds to subdivisions of a particular TF. Table 1 shows the distribution of 15,000 commercial dairy farms in Galicia included in the 1999 Census of Agriculture and classified in terms of TFs and ESU. The table reflects the difficulty of establishing such a classification, given the different TFs, up to 25, and the variation in economic size, which yields an average of 12.66 ESU with a SD of 15.45.
The dairy sector is the main agricultural sector in Galicia, and the most complex in terms of farm structure (agriculture and livestock are integrated), geographical dispersion (the sector is present in dissimilar land areas), size classification, and complexity of production. Barbeyto (1998) conducted an analysis of the Autonomous Community of Galicia. He used data from farms that belonged to different farm management associations, and classified dairy farms into two groups, a head group and a tail group, in terms of farm net yield per litre of milk. Such a classification can be applied to more than two groups and enables a comparison between different technical and economic indicators for farms with different levels of economic efficiency. This study contributes an innovative approach in that all the farms are analyzed taking into consideration all the variables that affect their technical, economic and environmental viability (Zhen and Routray, 2003). Thus, farms are aggregated into groups based on the values of each variable and the representativeness of each group is related to the number of farms in which a variable occurs.
Another contribution in Galicia was made through the «Study of Agricultural Production Planning» (Xunta de Galicia, 2004). A model was developed for analysis of different agricultural and forestry land use in Galicia, both current and potential, and implemented as spreadsheets based on Excel 2000 ® (Riveiro et al., 2005). Such a model produces technical-economic results that can be integrated with other indicators (structural, social and environmental), enabling the development of a decision-making system that allocated priority activities to each territorial area in the region. This study aims to develop a systematic process for determining different types, production sizes, production systems and locations of farms based on data from the Census of Agriculture (Table 2). The process must allow for the introduction of more specificity in any production planning process. In addition, this study attempts to characterize the evolution of the agricultural activity and to predict production trends by analysing information from the consecutive 1989 and 1999 Censuses of Agriculture. The novelty of the work lies in establishing different groups of holdings considering the total population according to existing data from the agricultural census, which allows accurate characterization.

Material and methods
The working universe of data, considered in this study, was microdata from the 1989 and 1999 Censuses of Agriculture (INE, 1989(INE, , 1999) that corresponded to Galicia. In 1999 there were 270,053 farms that distributed over 696,691 ha of utilised agricultural area.
From among all the farms, dairy farms were chosen. From the 1999 census, we selected farms that sold milk (the data is in the census). From the 1989 census, we selected farms with 10 or more dairy cows (there was no information on farms that sold milk). This selection involved 15,000 dairy farms from the 1999 census and 14,834 from the 1989 census (Table 3).
An operational procedure was developed for grouping farms with common characteristics (type and size) and for determining the spatial distribution of farms (location) in a systematic manner. After the different groups were established, each group could be characterized separately, further information could be searched, and representative samples could be selected to carry out targeted surveys (focused on production processes, yield and consumption).
Tools used in the analysis were Microsoft Access and Excel, and an application developed using Microsoft Visual Basic. The application generated the possible combinations of land uses automatically and gave the composition and number of farms under each combination. This process was conducted in successive stages, and gave intermediate or final results. Figure 1 gives the procedure for generation of TFs. It consisted of establishing groups of farms composed of a particular combination of productive land uses that caused differences in the technical-economic results of the farm as compared to other combinations. A database was built from the microdata included in the most recent census of agriculture (1999). This database contained information about all the farms and showed a   The threshold is used as the primary classification parameter for removing non-commercial farms from the classification. Given the frequency of multi-crop farms in Galicia, particularly in small farms, this process is necessary to obtain a useful long-term methodology.
The process, described above, is essential because multi-crop and self-supply farms are not viable, which has caused a dramatic decrease in the number of farms in the last few years while total production levels have been maintained in Galicia. For example, Table 3 shows figures for the evolution of the dairy sector in Galicia between 1989 and 1999. The use of this threshold is essential to reflecting the current situation of the dairy production sector based on the average data of all farms, including self-supply farms. However, this threshold can be modified or removed.
The data for the variables, considered as reference variables, during the process of TF generation (> 50 fields) were extracted from the «land use table», which was the base table. Then, farm distribution was analysed for each land use according to production size, and a minimum threshold defined based on this distribution. The minimum threshold was considered by the system for generation of the different TFs. In addition, the minimum number of farms required to form a TF was established, with a view to generating representative TFs, i.e., types that are composed of a minimum number of farms.
The critical thresholds and parameters that act as dividers for the classification process are variable, i.e. the user may establish the critical thresholds and parameters deemed appropriate according to the situation under analysis. In this study, the critical thresholds and parameters that act as dividers for the classification process were established based on the information available, with a view to choosing only those farms where each land use or activity was of a minimum commercial size, such that self-supply farms, without commercial significance, were not included in the classification.
The study established minimum thresholds for other land uses or activities, included in the census, that could be combined with dairying: permanent grassland, multi-annual forage crops, corn silage, other pastures, annual forage crops, cattle rearing, beef cattle, potatoes, hardwood, wheat, softwood and mixed forest species.
The thresholds were established to guarantee the commercial size of the farm.The methodology of Riveiro et al. (2005) was used to determine the economic size of a farm based on economic yield. The values used to establish the thresholds and parameters required are defined, for the different regions of Galicia in Xunta de Galicia (2004), considering a farm is a dairy farm only if the land use or enterprise combined with dairying does not exceed 50% of farm production.
The minimum value adopted for corn silage, alfalfa or other pasture species was 0.5 ha because any farm with 10 or more cows must have an area of more than 0.5 ha under these crops for feeding cattle. Additional livestock activities such as beef cattle breeding, which were present on many farms, were only considered when there were 5 or more beef cows on the farm.
The minimum number of farms per type was set at 10, and the minimum percentage of farms out of the initial group of farms was set at 10%. By establishing these constraints, which can be modified, unrepresentative TFs could be eliminated at the start of the process by setting a minimum percentage, or at the end of the process by using an absolute value.
Having established these conditions, the process of combination generation was started. The process was implemented automatically in successive stages. The number of stages varied depending on the complexity of the TFs obtained, with a maximum of 10 stages, which was considered an appropriate value for this research.
The results of each stage are stored in spreadsheets, where fields that contain the characteristics of primary and secondary land uses or activities on each farm are qualitative. Conversely, fields that define farm size are quantitative, based on surface area or the number of cows.
Using 10 stages potentially allows grouping of farms, i.e., once all the farms with 10 or more dairy cows were selected, these farms were combined with the other land uses or activities present on the farms that were included in the census, a total of 12 TFs. The next stage would consist in combining each combination of two land uses or activities (dairy cattle + any of the land uses mentioned above) with the other 11 possible combinations. This would yield a total of 132 combinations (12*11). The third stage would consist in relating each of the 132 combinations obtained with the other 10 possible combinations, which would yield a total of 1320 (12*11*10) combinations at this stage. Potentially completing 10 stages would yield 239,500,800 (12*11*10*9*8*7*6*5*4*3) TFs. Assuming that all those TFs could exist, the process would become a farm enumeration rather than a classification.
Statistically, the above process is reduced by considering combinations (no element is repeated in the set considered and sets with a different order of elements are not counted). The number of possible combinations is estimated by the following equation: thus obtaining the following results in Table 4 for 12 land uses, where stage 1 shows the 12 combinations of two elements included in Table 5 and the highest number of combinations for stage 10 corresponds to 4,082 different TFs.
Theoretically, considering the highest potential land uses considered in the census (50) and the highest level of order (10), a maximum combination potentiality is up to 8.22*10 9 , and a maximum cumulatives potentiality is 1.08*10 10 . As the agricultural census only includes 250,073 farms, such a level of disaggregation is excessive. Moreover, Excel does not allow for more than 65,536 rows (combination record). Therefore, in theory, going beyond stage 5 or 6 would be useless.
The work developed for classifying dairy farms becomes redundant at the sixth stage, in which the number of TFs generated (924) covers the total number of farms (15,000). Therefore, why is it interesting to study further stages? It is interesting because of a characteristic of the groupings made by the application: such groupings are not disjoint, i.e., farms can meet the conditions of various TFs at each stage. For this reason, it is essential to achieve higher disaggregation levels to group farms into TFs according to the most significant factors. In this case, after having achieved a high level of TFs, it was established that the most significant factors were the presence or absence of corn silage, and the presence of beef cattle as complementary activities. As the census did not include any beef cattle farms that used corn silage, three TFs were established.
In addition, the possibility of working with 10 stages would be interesting for classifying land uses or activities that are present in a larger number of farms or in wider geographical areas.
Analysis of farms grouped in each TF was conducted by considering different size classes. By default, the application considers five size classes (C 1 to C 5 , Table 5), which allows the user to search for curves or trends that represent a different behaviour depending on the farm size, thus simplifying the process. The different size classes were established on the basis that most Galician farms are family farms. It was assumed that the reference farm was a farm with about 40 dairy cows, in which the labour force corresponded to the labour force available in an average family (Maseda et al., 2004). Spreadsheets were also used to characterize production sectors. Through the use of macros, spreadsheets allow the recovery of data for all the farms included in a representative group and to estimate values of the variables. There are many possibilities for data recovery. Thus, the user can recover data for farms included in one or more TFs, in one or several classes, in one municipality, in several municipalities or in the whole region.
Finally, an analysis can be performed to determine the variability of the dairy cattle sector among the different farm groups, classified according to TF, size, class and location. Such an analysis is conducted by retrieving from the land use table (dairy cattle table) the data for all the variables considered in the agriculture census for farms assigned to each group. The number of farms of each type, and class, in each municipality was automatically determined, so is possible to study their spatial distribution (Fig. 2).
Moreover, evolution of the sector can be determined based on the information available from two consecutive censuses. The 1989 Census of Agriculture was analysed using the TFs and size classes def ined for the year 1999. Analysis of the two censuses aimed at characterizing the evolution of the sector in terms of TFs, size classes and location.

Results
Based on the reference farm, the five size classes shown in Table 3 were built (C 1 , C 2 , C 3 , C 4 and C 5 ). Farms with less than 10 diary cows were not considered as it was considered that such farms were not economically viable. Table 3 shows the number of farms in both Censuses. The number of farms has decreased, particularly in the case of the smallest farms, while the number of large farms has increased.
Considering the constraints established, a total of 12 first-order combinations (with two land uses) were obtained. The combinations obtained met the preestablished minimum conditions and resulted from combining dairy cattle with permanent grasslands, other pastures, wheat, potato, corn silage, annual forage crops, multi-annual forage crops, beef cattle, cattle rearing, hardwood, softwood and mixed forest uses. Table 5 shows the structure of land uses and size classes generated by the system for first-order combinations, and the number of farms in which first-order combinations were present. Table 5 also shows the results based on the 1989 Census of Agriculture, which did not consider corn silage corn (this crop was not considered as an independent entry in the 1989 Census  and was included in annual forage crops). Between 1989 and 1999, significant changes occurred, among which, was a decrease in permanent grassland, other pastures or annual forage crops compared to multiannual forage crops or corn silage, and an increase in cattle rearing. There was an increase forestry land use on dairy farms with hardwood and mixed forest species, compared to softwood, which barely maintained the same values. These results are not the final stage of the classification algorithms. Rather, they are illustrative of the results that can be obtained from agricultural censuses once farms are grouped according to the TFs obtained from the method.
Analysis of data from the 1999 census shows that pasture species, corn, and beef cattle breeding (as an activity that is integrated in some dairy farms) must be considered as they influence the productive process of holdings, and their high frequency. The following three TFs result from first-order combinations: -TF 1 . Dairy farms in which milk production is associated with land and based on pasture and forage crops without presence of corn crops, regardless of the presence of other complementary activities or land uses such as cattle rearing, potato crops or forest land (among the most common ones). In this type of farm, beef cattle breeding stock cannot exceed 4 animals.
-TF 2 . Identical to TF 1 , but with corn crops present.
-TF 3 . Dairy farms in which milk production is associated with the presence of beef cattle breeding stock of ≥ 5 animals, irrespective of the presence of forage or other complementary activities or land uses such as cattle rearing or fattening, potato crops or forest land, among the most common ones.
These TFs were applied to the data from the 1989 and 1999 censuses to obtain the number of farms included in each size class according to each TF (Table 6). Analysis of Table 6 shows a clear difference between the 1989 and the 1999 censuses with respect to farm number in each TF. As per the evolution of land use, the 1999 census incorporates corn silage (which was not considered in the 1989 census), and shows a less significant increase in the presence of beef cattle, which occurs mainly in the intermediate size classes (C 2 and C 3 ).
More specifically, the comparison of data between both censuses reveals that the TFs established do not coincide. TF 1 and TF 3 would better fit production systems observed in 1989. The census data confirms that milk production in Galicia around 1989 was mainly associated with production of forage crops based on grassland species that provided fresh forage during the springautumn period and dry forage (hay) as a winter reserve. A smaller number of farms combined this diet with annual forage crops such as corn, which was consumed green at the end of the summer, with turnips during winter or, in some cases in early spring and with forage from autumn sown grain crops (e.g. wheat, rye, oats). This was the prevailing animal diet pattern around 1970. At that time farm systems were less mechanized, and grazing was more relevant, as confirmed by the higher labour force levels, lower mechanization level and greater levels of permanent grasslands and other pastures.
Considering evolution of the different TFs, only some aspects of TF 1 and TF 3 from the 1999 Census, would admit comparison with the potentially homologous TFs from the 1989 Census. Retrospectively, TF 1 , for 1989, grouped all farms (or most of them) based on pasture and forage crop production, with an increasing presence of silage from these crops and similar management systems, which substantially coincides with the homologous type from the 1999 Census. Although not comprehensively studied, the TF with the highest similarity in terms of structure and evolution of management systems is TF 3 (which includes beef cattle). The forage base is similar, the management systems are extensive or semi-extensive, and most of them maintain grazing.
The spatial distribution of the different TFs was analysed at a municipal scale based on the three TFs obtained and on the corresponding size classes. By exporting these values to a GIS application or to an Excel application, maps can be generated for the whole of Galicia with the corresponding number of farms. Figure 2 is an example and shows the spatial distribution of part of the results included in Table 6. Considering average tractor power (Fig. 3) per unit area (UAA) as an indicator of farm mechanization level, it can be concluded that the farms belonging to TF 1 and TF 2 had a similar mechanization level, while mixed farms (TF 3 ) had lower values, with differences that were more evident for the smallest farms. The mechanization level per UAA and, consequently, per CU decreases with increased farm size.
The labour force occupied on farms (Fig. 4) per cow unit present on the farm (LU) decreases with decreased farm size. The values for TF 1 farms are higher than those for TF 3 . Differences between the two TFs are most evident for C 1 farms, while the differences disappear on C 5 farms. Comparison between the 1989 and 1999 censuses suggests similar trends for this variable. The trends in 1989 are more erratic, perhaps due to the lower number of farms included in each TF, there were only 23 farms in TF 3 , classes C 3 -C 5 , in 1989. However, values for 1989 are higher than for 1999. This could be explained by lower technology use on the farms.

Discussion
In 1999, the forage base on farms in Galicia was focused on production of annual or multi-annual pasture crops that are transformed into silage or, to a lesser extent, into hay for consumption over the whole year. On an increasing number of farms, pastures were combined with corn silage for year round consumption. These crops are used as ingredients of rations that can be adjusted by using concentrates according to farmers' needs, thus reducing, or avoiding, seasonality of milk production. This is associated with management systems, which show a high level of mechanization, less use of labour per unit (per animal), and reduction or cessation of grazing. Currently, the production system shows important differences, even with respect to the data in the 1999 census.
The methodology for establishment of a typology, classification and characterization of farms presented in this study is implemented in different phases and is based on automated analysis of data contained in the census of agriculture. This methodology enables the user to group, at the municipal level, all farms that show the same production pattern in a previously established size class. The spatial distribution of each reference pattern is determined based on these groups of similar elements. Determination of spatial distribution of TFs is essential to choose a reduced number of farms that are representative of each type and to give updated information about the production process (operations, raw materials, labour) and structure (size, machinery, facilities, land) of the farms. This information is useful for characterizing the production patterns (type-class combinations) that are representative of each activity. Such production patterns complete the information of the agricultural production-planning model developed.
The methodology presented here, can be used to analyse the evolution of a sector or to model future trends. The comparison between the 1989 and 1999 censuses reveals changes in both farm size and production structure. The dairy sector in Galicia has restructured. The main TFs, from the 1999 Census, did not necessarily evolve from the TFs present in 1989. For example, the TF based on the production of corn silage emerged as a new TF during the 1990s. Before that different winter forages, such as turnips or spring cereals, were grown. With the increase in the average farm size, these systems evolved towards grass silage, which could be mechanized. Grass silage was used as a reserve feed source for winter. This pattern has further evolved towards the production of corn silage, which is now common on most farms, as suggested by the evolution of the different land uses and by the variations observed in the farms included in the different size classes. This model can be adapted to other regions and countries by making slight modifications in the structure of the software application used. In addition, the model can be further applied to specific geographical areas such as parishes, municipalities or agricultural regions to adjust the different TFs and size classes according to specifications of the physical environment.
The methodology proposed in this study obtains information from a very significant statistical source, the agricultural censuses. This valuable source is under utilised, perhaps because of a lack of tools to handle the large volume of data they contain. The use of census data is usually restricted to data abstracts, in which data are grouped and systematized, thus providing some information. The traditional trend is to work with census abstracts that provide characterization of a production system or region by grouping data (Reidsma et al., 2006). This approach changes dramatically if micro-data are used, i.e. when there are individual data (the results of the census) available for each farm included in the census.
Individual data allows the division of farms into differential groups, which shows a clear potential for the use of the information in agricultural censuses. This approach is completely innovative.
This approach was applied to dairy farms in Galicia, and allowed the establishment of relational TFs with the dairy production system or the presence of complementary activities and/or land uses. Grouping farms into homogeneous groups enables one to: i) study the location of the different groups to determine if there is any relationship with environmental factors that can explain the differences; ii) obtain census data that are grouped for each TF to present differences among groups, i.e. to characterize each TF; and iii) carry out a field survey that actually represents the dairy sector by stratifying the population by types and classes.
These analyses can be applied to agricultural production planning at different levels: public administration, individual farmers or farmers' associations.
Further, the frequency of agricultural censuses (every 10 years) and the frequency of approval by the EU, allows for analyses and prospects across time and space.
The methodology, explained above, is compared with traditional classifications based on TFs and ESUs (Andersen et al., 2007). Table 1 shows the large number of TFs (25 for farms that sell cow milk) and the large variation in ESUs (from 6 to more than 300). Traditional classifications do not allow the user to make a specific analysis of the sector or to fully use information from agricultural censuses. Moreover, these classifications cannot be applied to agricultural production planning.
From the data farms with less than 10 cows have dramatically decreased during the last few years. Such farms are self-supply/subsistence farms and are not economically viable. This situation persists today. According to values for 2003, from 27,106 farms in Galicia, 14,183 have 10 or more cows and 12,923 have less than 10 cows (7% of the total number of cows and < 8% of total yield). Milk production in Galicia has increased every year despite the dramatic decrease in farms with less than 10 cows. Based on this, 10 cows was consider an appropriate minimum value.
The results of the study can be extrapolated because agricultural censuses published in Spain and in all the member states of the EU are harmonized. Therefore, the methodology proposed in this study can also be developed for those countries.