Relations between zero-inflated variables in trials with horticultural crops

Certain characteristics of some vegetable crops allow multiple harvests during the production cycle; however, to our knowledge, no study has described the behavior of fruit production with progression of the production cycle in vegetable crops with multiple harvests that present data overdispersion. We aimed to characterize the data overdispersion of zero-inflated variables and identify the behavior of these variables during the production cycle of several vegetable crops with multiple harvests. Data from 11 uniformity trials were used without applying treatments; these comprise the database from the Experimental Plants Group at the Federal University of Santa Maria, Brazil. The trials were conducted using four horticultural species grown during different cultivation seasons, cultivation environments, and experimental structures. Although at each harvest, a larger number of basic units with harvest fruit was observed than units without harvest fruit, the basic unit percentage without fruit was high, generating an overdispersion within each individual harvest. The variability within each harvest was high and increased with the evolution of the production cycle of Capsicum annuum, Solanum lycopersicum var. cerasiforme, Phaseolus vulgaris, and Cucurbita pepo species. However, the correlation coefficient between the mean weight and number of harvest fruits tended to remain constant during the crop production cycle. These behaviors show that harvest management should be done individually, at each harvest, such that data overdispersion is reduced. Additional key words: multiple harvests; data overdispersion; experimental planning. Abbreviations used: BU (basic unit); CV (coefficient of variation). Authors’ contributions: Conceived and designed the experiments: ADL, LFN and FR. Performed the experiments: MPBP. Analyzed the data: ADL. Wrote the paper: ADL, LFN, FR and MPBP. Citation: Lúcio, A. D.; Nunes, L. F.; Rego, F.; Pasini, M. P. B. (2016). Relations between zero-inflated variables in trials with horticultural crops. Spanish Journal of Agricultural Research, Volume 14, Issue 2, e0906. http://dx.doi.org/10.5424/sjar/2016142-8175. Received: 17 Jun 2015. Accepted: 12 May 2016 Copyright © 2016 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial (by-nc) Spain 3.0 Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: The Brazilian Ministry of Education’s Graduate Education Support Agency (CAPES) awarded an overseas post doctorate scholarship (Process BEX 1457/14-4). Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Alessandro D. Lúcio: adlucio@ufsm.br. Introduction In some vegetable species, certain specific characteristics allow multiple harvests during the production cycle. The realization of such multiple harvests is defined in a subjective manner and varies with the season and with each cultivated species. In experiments on species with multiple harvests all over the world, the above variations should be considered together with interference among these variables. Such interferences can inflate any residual variance and induce inadequate estimates in the experimental design because of the lack of adequate information at harvest, favoring overdispersion in the database, with tabulation of a large number of null values. Alessandro D. Lúcio, Luis F. Nunes, Francisco Rego and Maurício P. B. Pasini Spanish Journal of Agricultural Research June 2016 • Volume 14 • Issue 2 • e0906 2 treatments for crop development. However, to our knowledge, no studies have described the behavior of fruit production with progression of the production cycle in vegetable crops with multiple harvests that present data overdispersion. This study aimed to characterize the data overdispersion of the zero-inflated variables and identify the behavior of these variables during the production cycle of several vegetable crops with multiple harvests. Material and methods Data from 11 uniformity trials were used without applying treatments. These comprise the database from the Experimental Plants Group at the Federal University of Santa Maria, Brazil. The trials were performed on four horticultural species, hibrids, grown in different cultivation seasons, cultivation environments, and experimental structures (Table 1). Each experimental basic unit (BU) was composed of a single plant in each row of plants, except for trials with Phaseolus vulgaris, where each BU consisted of two plants because of the indeterminate growth characteristic of the species and the tendency to climb ontoin adjacent plants. During each harvest, the number and weight (in grams) of fruits harvested from each BU were observed, except for trials with P. vulgaris, where only 2006, 2008; Benz et al., 2015), studying data transformations (Couto et al., 2009) and using the Papadakis method to minimize the effects of excess zeros and resultant data overdispersion (Lúcio et al., 2016). Lopes et al. (1998), Lorentz et al. (2005), Carpes et al. (2008) and Lúcio et al. (2008), have pointed out significant variability between crop rows and harvests, regardless of the species used, and that such variability significantly alters the estimates of sample sizes, types of sampling, size and form of the parcel, experimental outline, and number of harvests needed to adequately differentiate the study treatments. The relationship between the observed variables, number and weight of fruits harvested in experiments with vegetable crops, and behavior of these species during the production cycle is important, as it generates information on how multiple harvesting should be planned and carried out. One of the problems associated with repeated measurements is the excess of variables with zero values. An interesting strategy to reduce this problem is to estimate the ideal plot size so that the majority of results have values greater than zero, subsequently reducing the variance. Another strategy is to estimate the ideal plot size that provides the smallest variance between the evaluated plots, because often researchers solve this problem empirically, based on practical sizes for conducting the experiment, available area, or from experience. In agricultural research, it is common to evaluate the full cycle of a particular species or compare different Table 1. Uniformity trials without treatment application used in the study. Species and hibrids Cultivation environment Growing season No. cultivation rows No basic unit (BU) per cultivation row No of harvests Harvests in days after sowing or transplanting Capsicum annuum Vidi hybrid Plastic greenhouse SummerAutumn 10 70 5 65, 79, 95, 124, 129 Plastic greenhouse WinterSpring 10 70 4 47, 54, 61, 68 Phaseolus vulgaris Macarrão hybrid Plastic greenhouse AutumnWinter 6 36 (double BU) 4 61, 74, 88, 112 Field AutumnWinter 3 42 (double BU) 4 61, 74, 88, 112 Plastic tunnel AutumnWinter 3 42 (double BU) 4 61, 74, 88, 112 Field SpringSummer 3 42 (double BU) 3 70, 91, 99 Plastic tunnel SpringSummer 3 42 (double BU) 3 70, 91, 99 Solanum lycopersicum var. cerasiforme Lili hybrid Plastic greenhouse (250 m2) SpringSummer 8 40 3 66, 82, 101 Plastic greenhouse (200 m2) SpringSummer 8 30 3 75, 88, 103 Cucurbita pepo Caserta hybrid Plastic greenhouse AutumnWinter 8 20 12 35, 37, 40, 43, 47, 49, 54, 57, 59, 61, 66, 68 Plastic greenhouse SpringSummer 8 20 30 29, 33, 35, 37, 39, 41, 43, 44, 47, 50, 53, 55, 57, 59, 60, 61, 62, 64, 66, 67, 68, 70, 73, 75, 76, 77, 80, 82, 83, 85


Introduction
In some vegetable species, certain specific characteristics allow multiple harvests during the production cycle.The realization of such multiple harvests is defined in a subjective manner and varies with the season and with each cultivated species.In experiments on species with multiple harvests all over the world, the above variations should be considered together with interference among these variables.Such interferences can inflate any residual variance and induce inadequate estimates in the experimental design because of the lack of adequate information at harvest, favoring overdispersion in the database, with tabulation of a large number of null values. 2 treatments for crop development.However, to our knowledge, no studies have described the behavior of fruit production with progression of the production cycle in vegetable crops with multiple harvests that present data overdispersion.
This study aimed to characterize the data overdispersion of the zero-inflated variables and identify the behavior of these variables during the production cycle of several vegetable crops with multiple harvests.

Material and methods
Data from 11 uniformity trials were used without applying treatments.These comprise the database from the Experimental Plants Group at the Federal University of Santa Maria, Brazil.The trials were performed on four horticultural species, hibrids, grown in different cultivation seasons, cultivation environments, and experimental structures (Table 1).Each experimental basic unit (BU) was composed of a single plant in each row of plants, except for trials with Phaseolus vulgaris, where each BU consisted of two plants because of the indeterminate growth characteristic of the species and the tendency to climb ontoin adjacent plants.
During each harvest, the number and weight (in grams) of fruits harvested from each BU were observed, except for trials with P. vulgaris, where only 2006, 2008;Benz et al., 2015), studying data transformations (Couto et al., 2009) and using the Papadakis method to minimize the effects of excess zeros and resultant data overdispersion (Lúcio et al., 2016).Lopes et al. (1998), Lorentz et al. (2005), Carpes et al. (2008) and Lúcio et al. (2008), have pointed out significant variability between crop rows and harvests, regardless of the species used, and that such variability significantly alters the estimates of sample sizes, types of sampling, size and form of the parcel, experimental outline, and number of harvests needed to adequately differentiate the study treatments.
The relationship between the observed variables, number and weight of fruits harvested in experiments with vegetable crops, and behavior of these species during the production cycle is important, as it generates information on how multiple harvesting should be planned and carried out.One of the problems associated with repeated measurements is the excess of variables with zero values.An interesting strategy to reduce this problem is to estimate the ideal plot size so that the majority of results have values greater than zero, subsequently reducing the variance.Another strategy is to estimate the ideal plot size that provides the smallest variance between the evaluated plots, because often researchers solve this problem empirically, based on practical sizes for conducting the experiment, available area, or from experience.

Nº of harvests
Relations between zero-inflated variables in trials with horticultural crops the harvest weight was noted.In the trials, the number of bunches harvested by BU was noted.
In each harvest, for number and weight of fruits and number of bunches, an initial descriptive statistical analysis was conducted from which we obtained the percentage of estimates of BU with zero values, the minimum and maximum values, medians, means, coefficient of variation (CV, in %), and degrees of asymmetry and kurtosis.Box-plots were constructed for the number and weight of fruits of each harvest, in order to identify the variability and average behavior of these variables with progression of the production cycle of the species evaluated.Further, we compared the proportions of BU with and without harvest fruits, adopting a 50% probability for presence or absence of fruits ready to be harvested.
A linear correlation analysis between the mean weight and the number of fruits per BU was also performed.For Solanum lycopersicum var.cerasiforme trials, we also estimated the correlation coefficient between the mean weight of fruits and number of bunches per BU for individual species and cultivation season.Next, for each variable, the Shapiro-Wilk test was performed to identify data adherence to a normal distribution and the Levene test to identify variance homogeneity.For all the statistical analyses performed, a probability of error of 5% was adopted, using Action software 2.7 version.

Results
Lack of adherence to a normal distribution was identified within each harvest along with variance heterogeneity among the multiple harvests, independent of species, season, cultivation environment, and observed variable, because of the high variance esti- Within each harvest, significant correlations coefficients were noted between the mean weight and number of fruits and/or bunches harvested per BU, with estimates of around 0.6 for C. annuum and S. lycopersicum var.cerasiforme species (Figs. 5 and 6).As for C. pepo, the estimates varied as the production cycle progressed, were significant and presented maximum values around 0.8 (Fig. 8).As previously described, C. pepo presented different characteristics during fruit maturation, which mates, and consequently, the CV (Tables 2 to 6).When plotting the weight and number of fruits (bunches in one case) variability in each of the multiple harvests, independent of the above conditions, we could not identify similar behavior of variability with progression of the production cycle of the species (Figs. 1 to 4).
When comparing the proportion of BU with and without harvest fruits, within each of the multiple harvests, in 13.3% of the harvests (10 of 75 harvests under all study conditions), the proportions did not differ; that is, they had statistically the same number of BU with and without harvest fruits in the specific season.In 15 (23.1%) of the 65 harvests the BU proportion without harvest fruit was significant greater than that

8
of BUs with observed values equal to zero.This fact changes the entire behavior of the descriptive statistics estimates, such as the asymmetry and degree of kurtosis (Tables 2 to 6).This situation means that in most cases, the data show a positive asymmetrical distribution and high degree of kurtosis with a platykurtic distribution.
The appearance of fruits on the plants on different days, causing variation in growth among them; early generated results different from those obtained with the other studied species that showed similar correlation coefficient estimates in the multiple harvests.

Discussion
The high variability and overdispersion identified in the data is a direct consequence of the high number     Days after the transplant of the seedlings Days after the transplant of the seedlings 11 Relations between zero-inflated variables in trials with horticultural crops Figure 5. Percentage of basic units without fruits harvested and correlation coefficient of the mean weight of fruits per basic unit and the number of fruits per basic unit, in Capsicum annuum cultivated in a plastic greenhouse in different seasonal stations.*:Significant difference between the proportions of fruit present or not fit to be harvest, at 5% probability of error; +: Significant correlation coefficients at 5% probability of error.ns: not significant.Significant difference between the proportions of fruit present or not fit to be harvest, at 5% probability of error; +: Significant correlation coefficients at 5% probability of error.ns: not significant.c): There was no significant difference between the proportions of fruit present or not fit to be harvest at 60, 64, 66, 70, 77, 82 and 85 days after the transplant of the seedlings.In the other crops there was no significant difference between the proportions.d): All correlation coefficients were significant at 5% probability of error, except those obtained 39 and 53 days after the transplant of the seedlings.

13
Relations between zero-inflated variables in trials with horticultural crops ability during the course of crop production cycles.Even without fruit harvest in a BU, the variability of the data remained high and kept increasing, because in this particular case, the value of the crop n remained identical to the value obtained at harvest n−1, while in the BU with harvested fruits, the value increased; thus, variability in the values in each harvest tended to increase (Figs. 1 and 4).One way to reduce data variability, and thus, overdispersion, is to increase the number of BUs with harvested fruits within each harvest, since this will also increase the number of harvested fruit and total weight of fruit within each BU.As previously mentioned, a practical and viable manner to promote this situation is to clearly define the harvest point and identify time intervals between each harvest.
In summary, within each harvest, there were more basic units (BU) with than without harvest fruit.However, the BU percentage without fruits was high, generating data overdispersion within each harvest.The variability within each harvest is high and increases as the production cycle progresses in C. annuum, S. lycopersicum var.cerasiforme, P. vulgaris, and C. pepo.The correlation coefficient values between the average fruit weight and number of harvested fruits tended to remain constant during the crop production cycle.These behaviors show that harvest management should be done individually, at each harvest, such that data overdispersion is reduced.
or late maturation of some fruits; and lack of uniformity in size at harvest, beyond their lack of uniformity in size fruit at harvest, resulting in the inability to define the ideal harvest point, are factors increasing variability in the fruit number and weight, causing data overdispersion and consequent variations in the statistical analysis.Cargnelutti Filho et al. (2004) obtained higher CV% values at the beginning and end of the tomato harvest, because the beginning and end of the fruit production were not uniform among the plants.In another study on tomatoes, Lúcio et al. (2010) found that the largest production of fruits occurred mid-way and at the end of the production cycle, and that the variability increased due to physiological aspects of the plants, because they go into a state of senescence.
With larger number of harvests, an increase in variability is noted.Souza et al. (2002), Oliveira et al. (2005), and Lúcio et al. (2006) recommend that homogeneous variances be maintained during the production cycle of the crop.Further, they suggest that researchers clearly define the ideal number of harvests to be performed, which then must be planned and executed by considering each row as a block, thereby allowing experimental repetition.Variabilities were also noted in the studies by Lúcio et al. (2004), Mello et al. (2004), andLorentz &Lúcio (2009) on C. annuum;Carpes et al. (2008) and Lúcio et al. (2008) on C. pepo;and Storck et al. (2014) on Passiflora edulis.According to these authors, an increase in the number of replicates is recommended, along with possibly increasing the plot size.
The non-adherence to a normal distribution of the data in the C. pepo trials can be explained by the number of harvests, which was larger than those of C. annuum and S. lycopersicum var.cerasiforme.With the increased number of harvests in these trials, within each individual harvest, lesser number of fruits per BU was observed, consequently, showing a tendency of nonadherence to the normal distribution with smaller data overdispersion than for species with fewer harvests with greater number of harvested fruits in the individual harvests (Tables 2 to 6).
These variance behaviors show that harvest management should be done individually, at each harvest, such that data overdispersion is reduced.Appropriate definition of each fruit harvest time can be a practical alternative, as well as defining time intervals for each harvest rather than identifying a specific day.Thus, the BU number with fruit ready for harvest can be increased, with reduction in the data amplitude within each multiple harvests.
The main characteristic of the studied species in plants without fruits ready to be harvested throughout their production cycle was a lack of reduction of vari-

Figure 1 .
Figure 1.Box-plot for weight (grams per basic unit) (a,c) and number of fruits (b,d) per harvest by basic unit in Capsicum annuum uniformity trials in a plastic greenhouse in different growing seasons.

Figure 2 .Figure 3 .
Figure 2. Box-plot for weight of fruits (grams per basic unit) (a,d), number of bunches (b,e) and number of fruit (c,f) per harvest by basic unit in Solanum lycopersicum var.cerasiforme uniformity trials in the spring-summer season under different environmental conditions.

Figure 4 .
Figure 4. Box-plot for weight (grams per basic unit) (a,c) and number of fruits (b,d) per harvest by basic unit in Cucurbita pepo uniformity trials in autumn-winter and spring-summer seasons.

Figure 6 .Figure 7 .Figure 8 .
Figure6.Percentage of basic units without fruits harvested and correlation coefficients of the mean weight and number of fruits per basic unit and mean weight and number of bunches per basic unit, in Solanum lycopersicum var.cerasiforme cultivated in the springsummer season in a plastic greenhouse 250 × 200 m 2 .*:Significant difference between the proportions of fruit present or not fit to be harvest, at 5% probability of error; +: Significant correlation coefficients at 5% probability of error.ns: not significant.

Table 1 .
Uniformity trials without treatment application used in the study.

Table 2 .
Descriptive statistics for weight (grams per basic unit) and number of fruits harvested per basic unit uniformity trials for Capsicum annuum cultivated in different growing seasons.

Table 3 .
Descriptive statistics for weight of fruit (grams per basic unit) and number of fruits and bunches harvested per basic unit in uniformity trials of Solanum lycopersicum var.cerasiforme cultivated in the spring-summer seasons under different environmental conditions.

Table 4 .
Descriptive statistics for weight of fruit (grams per basic unit) harvested in uniformity trials of Phaseolus vulgaris cultivated in different growing seasons and environmental conditions.

Table 5 .
Data descriptive statistics for weight (grams per basic unit) and number of fruits harvested by basic unit in the uniformity trials of Cucurbita pepo grown in the autumn-winter season.

Table 6 .
Data descriptive statistics for weight (grams per basic unit) and number of fruits harvested by the basic unit in the uniformity trials of Cucurbita pepo grown in the spring-summer season.

Table 6 (
cont.).Data descriptive statistics for weight (grams per basic unit) and number of fruits harvested by the basic unit in the uniformity trials of Cucurbita pepo grown in the spring-summer season.