Development of a remote sensing-based rice yield forecasting model

This study aimed to develop a remote sensing-based method for forecasting rice yield by considering vegetation greenness conditions during initial and peak greenness stages of the crop; and implemented for “boro” rice in Bangladeshi context. In this research, we used Moderate Resolution Imaging Spectroradiometer (MODIS)-derived two 16-day composite of normalized difference vegetation index (NDVI) images at 250 m spatial resolution acquired during the initial (January 1 to January 16) and peak greenness (March 23/24 to April 6/7 depending on leap year) stages in conjunction with secondary datasets (i.e., boro suitability map, and ground-based information) during 2007-2012 period. The method consisted of two components: (i) developing a model for delineating area under rice cultivation before harvesting; and (ii) forecasting rice yield as a function of NDVI. Our results demonstrated strong agreements between the model (i.e., MODIS-based) and ground-based area estimates during 2010-2012 period, i.e., coefficient of determination (R2); root mean square error (RMSE); and relative error (RE) in between 0.93 to 0.95; 30,519 to 37,451 ha; and ±10% respectively at the 23 district-levels. We also found good agreements between forecasted (i.e., MODIS-based) and ground-based yields during 2010-2012 period (R2 between 0.76 and 0.86; RMSE between 0.21 and 0.29 Mton/ha, and RE between -5.45% and 6.65%) at the 23 district-levels. We believe that our developments of forecasting the boro rice yield would be useful for the decision makers in addressing food security in Bangladesh. Additional key words: food security; MODIS; multi-temporal dataset; normalized difference vegetation index. Abbreviations used: BARC (Bangladesh Agricultural Research Council); BBS (Bangladesh Bureau of Statistics); MODIS (moderate resolution imaging spectroradiometer); NDVI (normalized difference vegetation index); NIR (near infrared); R (red); R2 (coefficient of determination); RE (relative error); RMSE (root mean square error). Authors’ contributions: Conception or design: MKM and QKH. Acquisition, analysis, or interpretation of data and writing of the manuscript: MKM, QKH, and EHC. Supervision: QKH. Citation: Mosleh, M. K.; Hassan, Q. K.; Chowdhury, E. H. (2016). Development of a remote sensing-based rice yield forecasting model. Spanish Journal of Agricultural Research, Volume 14, Issue 3, e0907. http://dx.doi.org/10.5424/sjar/2016143-8347. Received: 20 Jul 2015. Accepted: 13 Jul 2016. Copyright © 2016 INIA. This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial (by-nc) Spain 3.0 Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: National Sciences and Engineering Research Council of Canada (grant to QKH); Egyptian Government (PhD scholarship to MKM). Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Quazi K. Hassan: qhassan@ucalgary.ca.


Introduction
Rice (Oryza sativa L.) is a primary food for more than three billion people worldwide (Khush, 2005).It is cultivated on about 11.5% of the world's arable land during 2012 (http://www.fao.org/3/a-i4147e.pdf);however more than 88% production is observed in Asian countries (http://faostat.fao.org/site/377/DesktopDefault.aspx?PageID=377#ancor).In the recent decades, two major issues like population growth in particular to the rice consuming countries where ~60% of the world's population lives (http://www.geohive.com/earth/his_proj_asia.aspx) and climate change put enormous pressure on the rice demand (IPCC, 2014).As such, understanding (in other words, forecasting) the 2 tics (BBS) gathers information regarding the area under rice plantation using field surveys and interviewing the farmers.In terms of yield/production information, the BBS collects data from 9,348 number of plots across the country (BBS, 2010).Despite the accuracy of these data and its ability to depict historical trends, this method has two major drawbacks: (i) time-consuming, subjective, costly, and labour-intensive (Reynolds et al., 2000;Prasad et al., 2006;Nguyen et al., 2012); and (ii) the outcomes are usually made available to the government and public after several months of the harvesting of the crop; thus not useful for food security purposes (Noureldin et al., 2013).In order to address these issues, an alternate method is the use of the remote sensing-based techniques that have already demonstrated effectiveness in forecasting the rice yield (Jing-Feng et al., 2002;Wang et al., 2010;Chen et al., 2011;Nuarsa et al., 2012;Huang et al., 2013) and assessing the yield for other crops (Bonilla et al., 2015;Fortes et al., 2015).It is being possible as remote sensing platforms are able to acquire cropping season dynamics over a large geographic extent on timely fashion in the form of images.These images, in general, depict crop-specific characteristics, which could be important in developing pre-harvest yield forecasting models (Jing-Feng et al., 2002;Liu & Kogan, 2002;Prasad et al., 2006;Salazar et al., 2007;Mkhabela et al., 2011).For example: (i) Patel et al. (1991) established an empirical relationship between the Indian Remote Sensing Linear Imaging Self Scanning (IRS LISS)-derived ratio between near infrared (NIR) and red (R) spectral bands and ground-based yield; and found that the coefficient of determination (R 2 ), root mean square error (RMSE), and deviations were 0.52, 2.62, and in the range 2-14%, respectively, over India; (ii) Rahman et al. (2009;2012) utilized Advanced Very High Resolution Radiometer (AVHRR)-derived 7-day composite of normalized difference vegetation index (NDVI) and brightness temperature at 16 km resolution to compute several vegetation health-related indices such as vegetation condition index, temperature condition index, and vegetation health index in forecasting yield for two types of rice, i.e., aus and aman over Bangladesh.They observed that modelled and ground-based rice yield revealed a R 2 of 0.56 and 0.89 for aus and aman respectively; (iii) Savin & Isaev (2010) used MODIS-derived 10-day composite of NDVI at 250m resolution, fraction of absorbed radiation and two meteorological variables (i.e., temperature and incident solar radiation) to develop a process-based model for forecasting rice yield over Republic of Kalmykia.They found that the model performed best (i.e., overly forecasted in the range 14-48%) during the maximum greenness period; (iv) Nuarsa et al. (2012) employed Landsat ETM+-derived NDVI-values at 30 m resolution during the peak greenness period (i.e., 63 days after the plantation); and observed a good relationship (i.e., R 2 ≈0.93) between forecasted rice yield and ground-based estimates in Indonesia; (v) Noureldin et al. (2013) used different reflective spectral bands of SPOT-4 and several vegetation indices at 20 m resolution during the peak greenness stage over Kafr El-Sheikh Governorate, Egypt.Among these, the spectral bands of R and NIR, and vegetation indices of difference vegetation index, ratio vegetation index, infrared percentage vegetation index, soil adjusted vegetation index, and NDVI demonstrated strong relations (R 2 in the range 0.90 to 0.95) with ground-based rice yield; and (vi) Son et al. (2013) used MODIS-derived 8-day composite of enhanced vegetation index (EVI) and leaf area index (LAI), and then developed eight models using linear, quadratic, interaction, and pure quadratic equations over Mekong Delta, Vietnam.They found the quadratic model based on EVI and LAI generated the best results (i.e., R 2 of 0.70 and 0.74 for spring-winter and autumn-summer rice crop, respectively) during ripening stage.In addition, established model was validated against the ground-based yield statistics and revealed good results (i.e., RMSE of 10.18, 17.65, and 17%; mean absolute error of 8.44, 14.06, and 9.14%; and mean bias error of 0.9, 3.52, and 2.31% for spring-winter, and autumn-summer of 2006 and 2007; respectively).
In general, most of the above mentioned methods are empirical in nature, so thus they would require further calibration and validation prior to implement over other geographical locations.In addition, the mechanism of determining the agricultural fields with rice crops was not described in the above mentioned papers.As such, we propose a rice yield forecasting system and its application for boro rice (i.e., cultivated during the months of January to May) in Bangladeshi context.This system consists of two distinct steps, such as (i) the delineation of area under rice cultivation using two16-day composite of MODIS-derived NDVI-images at 250m resolution that coincided with the time period of the initial (i.e., January 1 to January 16) and the peak greenness (i.e., March 23/24 to April 6/7 depending on leap year) of the rice of interest; and (ii) forecasting of rice yield as a function of NDVI values acquired at the peak greenness period by developing an empirical method similar to those illustrated in the previous paragraph.

Study area
In this study, we considered the whole country of Bangladesh as our study area (Fig. 1).It lies between 20° 34′ to 26° 38′ North latitude and 88° 01′ to 92° 41′ 3 Remote sensing of rice yield forecasting East longitude (Fig. 1b).The country has an area of approximately 143,998 km 2 with a total population of 160 million (https://www.cia.gov/library/publications/the-world-factbook/geos/bg.html).The study area experiences almost flat topography except for the Chittagong Hilly areas in the southeastern part, the Sylhet mountain range in the northeast, the Madhupur Tract in the central part, and the Barind Tract in the northwest.In terms of climate, it observes sub-tropical conditions with three dominant seasons, i.e., dry winter (December to February), pre-monsoon hot summer (March to May), and the rainy monsoon season (June to October) (Rashid, 1991).Temperature-wise, January is the coldest month with approximately 17°C and April is the hottest one with temperature in between 30-40°C.On an average per year, Bangladesh receives a minimum of 2000 mm of rainfall over most of the regions except the northwestern region of Rajshahi (~1600 mm), and coastal area of Chittagong in the southeast and northeastern part of Sylhet (~4000 mm) (http://www.weatheronline.co.uk/reports/climate/Bangladesh.htm).It is interesting to mention that approximately 80% of the rainfall happens during the monsoon season (i.e., June to October) and negligible amount in the dry season (i.e., December to February).Despite some variability in climatic conditions (in terms of both temperature and rainfall regimes in particular) throughout Bangladesh, the boro cultivation primarily depends on the ground water based irrigation schemes, which is usually being controlled by the farmers.Thus, we considered not to investigate the impact of natural climatic differences on the boro production across the country.Also, note that Bangladesh is a natural disaster (that include flood, cyclone, drought, river erosion, etc.) prone country, which plays critical role in the agricultural productivity including boro rice.
Bangladesh is an agricultural-dominant country, where approximately 62.8% of the total area is dedicated towards crop production.Among all crops, rice is the major contributor, i.e., cultivated in approximately 75% of the agricultural land (Karmokar & Shitan, 2012).In general, there are mainly three different types of rice are cultivated, i.e., aus (during April/May to July/August), aman (during April/May to November/ December), and boro (during January/February to May/ June).It is worthwhile to note the boro rice is the dominant contributor [i.e., approximately 55% of the total rice production during 2012 season (BBS, 2012)] among all kinds of rice production.

Data requirements
Here, we employed MODIS-derived two 16-day composite of NDVI images at 250 m resolution (i.e., MOD13Q1 version 005) during the initial (i.e., January 1 to January 16) and peak greenness (i.e., March 23/24 to April 6/7 depending on leap year) stages for the period 2007-2012 associated with the boro rice developmental stages.These images were corrected both geometrically and atmospherically; and quantified in the scope of NASA's earth observing system program (Huete et al., 2002;Justice et al., 2002).Upon acquisition of these images, we performed a set of pre-processing prior to the development of the rice forecasting method, such as (i) re-projecting of the NDVI images into Transverse Mercator from the default Sinusoidal Gridded Projection system; (ii) mosaicing of the two tiles of NDVI images in order to have continuous images for the entire study area; and (iii) identifying cloud-contaminated pixels using the quality assessment information, and excluded from further analysis.
Apart from the MODIS data, we employed two ancillary datasets, such as (i) boro suitability map available from Bangladesh Agricultural Research Council (BARC) to aid the delineation of the area under boro rice cultivation, and (ii) ground-based information about boro rice area and its yield available from BBS to calibrate and validate the performance of results obtained from MODIS-derived boro rice estimates of both area and its yields.The boro suitability map was developed at 1:250,000 scale as a function of soil permeability, soil depth, soil moisture, nutrient regimes, soil salinity, soil pH, soil consistency, drainage, inundation depth, flood hazards, slope, length of the crop growing season, and temperature regimes (Hussain et al., 2012).This map was released in 2012 and consisted of five categories, such as very suitable, suitable, moderately suitable, marginally suitable, and not suitable for boro cultivation.In case of ground-based information, they were collected using regular sampling surveys and interviewing the farmers.

Methods
The methods had two major components, (i) delineating areas under rice cultivation before harvesting; and (ii) forecasting rice yield as a function of NDVI and assessing the accuracies.

Delineating areas under rice cultivation
In delineating areas under rice cultivation, we proposed a new method that employed two 16-day compos-ite NDVI images acquired during initial (NDVI initial ) and peak greenness (NDVI peak-greenness ) stages (Fig. 2).This method was based on the exploitation of the fact that the rice plants would demonstrate low and high NDVI-values per unit area (250 × 250 m2 in this study) during the initial and peak greenness stages respectively.In addition, we derived a difference NDVI image, i.e., NDVI difference by subtracting the NDVI peak-greenness and NDVI initial images.Then we classified these three images using layer-specific threshold-values (i.e., T initial , T peak-greenness , and T difference ) and combined them in generating rice acreage map if they would satisfy the following set of conditions: • NDVI initial ≥ T initial ; • NDVI peak-greenness ≥ T peak-greenness ; and • NDVI difference ≥ T difference .
In defining the above-mentioned threshold-values, we considered the NDVI images acquired during the period 2007-2009.For a specific season/year, we generated a stack image comprising of three layers, i.e., NDVI initial , NDVI peak-greenness , and NDVI difference images.Then we classified this stack image into 40 clusters using an unsupervised classifier.Each of these clusters was then evaluated against the boro suitability map available from BARC.The clusters that fell in the scope of one of the three categories (very suitable, suitable, and moderately suitable) of the boro suitability map, were further considered as boro fields.These cluster-specific signatures/ patterns were then assimilated for each of the years for the period 2007-2009 as shown in Fig. 3; and used to define the variable (NDVI initial , NDVI peak-greenness , and NDVI difference )-specific thresholds (T initial , T peak-greenness , and T difference ).In these cases, the minimum-values of NDVIinitial , NDVI peak-greenness , and NDVI difference were considered as their respective thresholds; where the values of T initial , T peak-greenness , and T difference were found to be 0.21, 0.58, and 0.16, respectively (see Fig. 3 for details).
Once the thresholds were determined, we used them over independent datasets, i.e., the NDVI images during the period 2010-2012 (50% of the entire NDVI dataset); and produced season-specific maps for areas under boro rice cultivation.These were then evaluated against groundbased information at 23 district-levels available from BBS using a set of statistical measures, such as R 2 , RMSE, and RE, using the following mathematical expressions: where, n is the total number of observations, O i is the individual observed data, O is the average of the observed data, P i is the individual forecasted/predicted data, and P is the average of forecasted/predicted data.

Forecasting of rice yield
Upon delineating the rice area maps as outlined in the above section, we performed the following three steps in developing a yield model (Fig. 4).Firstly, we calculated district-specific average NDVI-values (i.e., NDVI peak−greenness ) upon considering all the rice pixels during the peak greenness period (i.e., March 23/24 to April 6/7 depending on leap year).Secondly, we estab-lished empirical relations between the NDVI peak−greenness and district-specific average yields during 2007-2009 period; which was quite similar to other studies (Rahman et al., 2009;Savin & Isaev, 2010;Nuarsa et al  6 2012; Huang et al., 2013).The regression-based model is one of the most widely used statistical technique for investigating the rela tionship between variables.Thus the forecasting of rice yield as a function of NDVI values during the peak greenness period is explained by the following relationship: where, Y is the average yield values for each individual district, a and b are the regression coefficients, and NDVI peak−greenness is the average NDVI values for each district during peak greenness period.Finally, we implemented the observed relationship (i.e., coefficients in Eq. [4]) in forecasting rice yield during 2010-2012 period, which were then compared against ground-based estimates.In this case, we em-

Delineation of areas under rice cultivation
Upon applying the thresholds determined in subsection "Delineating areas under rice cultivation", we applied them over validation dataset during the period 2010-2012 and the results are shown in Fig. 5.It revealed strong relations (i.e., R 2 in the range 0.93 to 0.95 and RMSE in the range 30,519 to 37,451 ha, at 23 district-levels) between modeled and ground-based area estimates.Our results were also similar to other studies: (i) Panigrahy et al. (1997) obtained accuracies in the range 85.8 to 91.5% in the early estimation of rice areas in Wet Bengal, India; (ii) Patel et al. (2004)   face water index for mapping the rice area over the Sanjiang Plain of northeast China, and found very good results (R 2 = 0.85) while comparing with the census data.
In addition, we also compared the boro rice area estimated using only two images acquired during the initial/transplantation and peak greenness stages in the scope of this paper with those extracted in an earlier study (Mosleh & Hassan, 2014).In that study, we employed ten 16-day composite of NDVI images acquired over the entire growing season and estimated the boro rice area after the harvesting of the crop.It would be interesting to note that the outcomes of the proposed method outperformed the earlier one.For example, the overall RE's were in the range ±10% using the proposed method in comparison to about ±30% using the method described in the earlier study (Mosleh & Hassan, 2014) (see Fig. 6 for details).Our findings implied that more images induced more error similar to other studies, e.g., (LeGrand, 2002); which were most likely related with data/image redundancies.In order to address this, one of the possible alternates would be the employment of 'principle component analysis' over all the ten images acquired during the entire growing sea-son (e.g., Mosleh & Hassan, 2014); which was beyond the scope of this paper.Note that both methods would complement each other as understanding the area (thus the yield as described previously) approximately one to two months prior to harvesting would be critical in ensuring food security.However, only the earlier estimation would hold if any natural disaster would not take place.

Forecasting of rice yield
Figure 7 shows the relationship between the districtspecific average NDVI values at the peak greenness period (i.e., in between March 23/24 to April 6/7 depending on leap year) and ground-based rice yield estimates during 2007-2009.It demonstrates strong relationships (i.e., R 2 in the range of 0.75-0.89),which are similar to other studies (Nusara et al., 2011;Noureldin et al., 2013).
The relationship observed in Fig. 7d (i.e., three years, 2007Fig. 7d (i.e., three years, -2009, of combined data) , of combined data)   Remote sensing of rice yield forecasting RMSE, were in the range 0.76 to 0.86, 0.21 to 0.29 Mton/ha, respectively during the period 2010-2012) between ground-based and forecasted yields at 23 district-levels.In addition, we also observed that REs were in the range -5.45 to +6.65% when compared at 23 district-levels (Fig. 9).In fact, our findings were similar to other studies, such as: (i) Despite good agreements, it would be worthwhile to note that our forecasting would hold if the rice crop not be affected by natural disturbances (that include cyclone, insect outbreak, etc.).In addition, approximately 14 to 24% of disagreements between the ground-based and forecasted rice yield estimates could be attributed by other factors, such as (i) satellite images might be affected by atmospheric effects (e.g., cloud), which degrade the quality of the acquired data and thus the developed crop-yield model (Mkhabela et al., 2011); (ii) variation in climatic conditions at microlevel during the growing season could potentially impact the agreement level of rice yield (Son et al., 2013;Mosleh et al., 2015); and (iii) uncertainty associated with ground-based yield estimates due to insufficient observations could lead to poor rice yield assessment (Mosleh & Hassan, 2014).
Although our proposed method demonstrated its effectiveness in forecasting rice yield; we would require to consider the following issues in order to potentially enhance its capability, such as (i) the use of relatively high resolution satellite imagery would able to distinguish rice cultivated area more precisely in particular to heterogeneous agricultural practices; (ii) it would be necessary to revisit the calibration parameters, i.e., 1:1 Line 1:1 Line 1:1 Line 10 slope and intercept of the expression illustrated in Fig. 7d periodically; and (iii) it would be worthwhile to investigate other important weather variables, such as temperature, precipitation, and solar radiation in particular; as they were also used in the of rice yield forecasting framework (Savin & Isaev, 2010).In addition, we strongly recommend that the proposed modelling schema be thoroughly evaluated prior to implementing in other geographical locations.

Figure 1 .
Figure 1.Study area (Bangladesh) in Southeast Asia (a), and 23 districts in Bangladesh (b) where ground-based boro rice cultivation area and its yield/production information were available (adopted from Mosleh & Hassan, 2014).

Figure 2 .Figure 3 .
Figure 2. Schematic diagram for delineating the area under boro rice cultivation before harvesting.

Figure 6 .
Figure 6.Comparison of the model and ground-based estimates of boro rice cultivation areas at 23 district-levels during the period 2010-2012 using ten NDVI images acquired over the entire growing season and two NDVI images acquired during the initial and peak greenness. 8

Figure 7 .
Figure7shows the relationship between the districtspecific average NDVI values at the peak greenness period (i.e., in between March 23/24 to April 6/7 depending on leap year) and ground-based rice yield estimates during 2007-2009.It demonstrates strong relationships (i.e., R 2 in the range of 0.75-0.89),which are similar to other studies(Nusara et al., 2011;Noureldin et al., 2013).The relationship observed in Fig.7d(i.e., three years, 2007-2009, of combined data) was implemented in forecasting rice yield during 2010-2012; and is shown in Fig.8.It reveals strong relationships (i.e., R 2 , NDVI-values during peak greenness period 9