Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity

Keywords: Triticum aestivum, multivariate statistical analysis, partial least squares regression, support vector regression

Abstract

Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield.

Area of study: Fars province, Iran.

Material and methods: In total, the data corresponding to 22 agronomic traits was collected from six different regions (Darab, Kavar, Marvdasht, Fasa, Lar, and Khonj) of 90 farms of Fars province, Iran as the most important wheat-growing regions. Multivariate statistical analysis (correlation, stepwise regression, and principal component analysis (PCA)) and machine learning modeling approaches, such as partial least squares regression (PLSR) and support vector regression (SVR) models, were applied to agronomic traits.

Main results: The findings, based on integrated approaches such as correlation, stepwise regression, and PCA, highlighted that number of spikes m-2, grain number spike-1, and thousand-grain weight had a major impact on the yield followed by awn length, spike length, narrow leaf herbicide, broadleaf herbicide, time to plant maturity (month), and soil salinity. Besides, PLSR with nine inputs (nine selected traits) displayed better prediction capability (R2=85 %, RMSE=0.32, MSE=0.10, and BIAS=-0.05) than that with all twenty-two input traits.

Research highlights: Integrated multivariate statistical analyses and machine learning regression methods could be a powerful tool in determining traits that have a significant impact on yield. These achievements can be considered for future breeding programs.

Downloads

Download data is not yet available.

References

Abdel-Sattar M, Aboukarima AM, Alnahdi BM, 2021. Application of artificial neural network and support vector regression in predicting mass of ber fruits (Ziziphus mauritiana Lamk.) based on fruit axial dimensions. PLoS ONE 16(1): e0245228. https://doi.org/10.1371/journal.pone.0245228

Abdi H, 2010. Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdisciplinary Reviews: Comput Stat 2: 97-106. https://doi.org/10.1002/wics.51

Abebe T, Wise RP, Skadsen RW, 2009. Comparative transcriptional profiling established the awn as the major photosynthetic organ of the barley spike while the lemma and the palea primarily protect the seed. Plant Genome 2: 247-259. https://doi.org/10.3835/plantgenome.2009.07.0019

Asseng S, Turner NC, Keating BA, 2001. Analysis of water- and nitrogen-use efficiency of wheat in a Mediterranean climate. Plant Soil 233: 127-143. https://doi.org/10.1023/A:1010381602223

Bao SD (ed), 2005. Analysis of soil agrochemistry. China Agriculture Press, Beijing, China. 495 pp.

Barikloo A, Alamdari P, Moravej K, Servati M, 2017. Prediction of irrigated wheat yield by using hybrid algorithm methods of artificial neural networks and genetic algorithm. J Water Soil 31: 715-726.

Baye A, Berihun B, Bantayehu M, Derebe B, 2020. Genotypic and phenotypic correlation and path coefficient analysis for yield and yield-related traits in advanced bread wheat (Triticum aestivum L.) lines. Cogent Food Agric 6(1): 1752603. https://doi.org/10.1080/23311932.2020.1752603

Botwright TL, Condon AG, Rebetzke AG, Richards RA, 2002. Field evaluation of early vigour for genetic improvement of grain wheat. Aust J Agric Res 53(10): 1137-1145. https://doi.org/10.1071/AR02007

Carrascal LM, Galván I, Gordo O, 2009. Partial least squares regression as an alternative to current regression methods used in ecology. Oikos 118(5): 681-690. https://doi.org/10.1111/j.1600-0706.2008.16881.x

Costa C, Menesatti P, Spinelli R, 2012. Performance modelling in forest operations through partial least square regression. Silva Fenn 46(2): 241-252. https://doi.org/10.14214/sf.57

Duan L, Xie H, Li Z, Yuan H, Guo Y, Xiao X, Zhou Q, 2020. Use of partial least squares regression to identify factors controlling rice yield in Southern China. Agron J 112(3): 1502-1516. https://doi.org/10.1002/agj2.20161

Farokhzadeh S, Shahsavand-Hassani H, Mohammadi-Nejad GH, 2013. Evaluation of genetic diversity of primary tritipyrum, triticale and bread wheat genotypes. Iran J Agron Sci 5: 93-112.

Farokhzadeh S, Fakheri BA, Mahdinejad N, Tahmasebi S, Mirsoleimani A, Heidari B, 2020. Mapping QTLs associated with grain yield and yield-related traits under aluminum stress in bread wheat. Crop Pasture Sci 71: 429-444. https://doi.org/10.1071/CP19511

Farokhzadeh S, Fakheri BA, Zinati Z, Tahmasebi S, 2021. New selection strategies for determining the traits contributing to increased grain yield in wheat (Triticum aestivum L.) under aluminum stress. Genet Resour Crop Evol 68: 2061-2073. https://doi.org/10.1007/s10722-021-01117-4

Farokhzadeh S, Shahsavand-Hassani H, Zinati Z, Rajaei M, 2022. Evaluation of triticale lines compared to wheat cultivars in terms of agronomic traits using supervised learning methods and multivariate statistics. Philipp Agric Sci 105(4): 369-389.

Fassio A, Cozzolino D, 2003. Non-destructive prediction of chemical composition in sunflower seeds by near infrared spectroscopy. Indust Crops Prod 20: 321-329. https://doi.org/10.1016/j.indcrop.2003.11.004

Gaju O, Reynolds MP, Sparkes DL, Foulkes MJ, 2009. Relationships between large-spike phenotype, grain number, and yield potential in spring wheat. Crop Sci 49: 961-973. https://doi.org/10.2135/cropsci2008.05.0285

Gustavo AS, Guillermo AG, Roman AS, Daniel JM, 2022. Physiological drivers of responses of grains per m2 to environmental and genetic factors in wheat. Field Crops Res 285: 108593. https://doi.org/10.1016/j.fcr.2022.108593

Hu Y, Wei X, Hao M, Fu W, Zhao J, Wang Z, 2018. Partial least squares regression for determining factors controlling winter wheat yield. Agron J 110: 281-292. https://doi.org/10.2134/agronj2017.02.0108

Jalilian A, Mondani F, Khoramivafa M, Bagheri A, 2018. Evaluation of Clipest model in simulation of winter wheat (Triticum aestivum L.) and wild oat (Avena ludoviciana L.) competition in Kermanshah. Iran J Agroeco 10: 248-266.

Leilah AA, Al-Khateeb SA, 2005. Statistical analysis of wheat yield under drought conditions. J Arid Environ 61(3): 483-496. https://doi.org/10.1016/j.jaridenv.2004.10.011

Lopes MS, Reynolds MP, Manes Y, Singh RP, Crossa J, Braun HJ, 2012. Genetic yield gains and changes in associated traits of CIMMYT spring bread wheat in a "historic" set representing 30 years of breeding. Crop Sci 52(3): 1123-1131. https://doi.org/10.2135/cropsci2011.09.0467

Martens H, Martens M, 2000. Modified Jack-knife estimation of parameter uncertainty in bilinear modeling by partial least squares regression (PLSR). Food Qual Prefer 11: 5-16. https://doi.org/10.1016/S0950-3293(99)00039-7

Mekonnen G, 2022. Wheat (Triticum aestivum L.) yield and yield components as influenced by herbicide application in Kaffa Zone, Southwestern Ethiopia. Int J Agron 2022: 3202931. https://doi.org/10.1155/2022/3202931

Meng M, Zhao C, 2015. Application of support vector machines to a small-sample prediction. Adv Petrol Explord Dev 10(2): 72-75.

Miralles DJ, Slafer GA, 2007. Sink limitations to yield in wheat: How could it be reduced? J Agric Sci 145: 139-149. https://doi.org/10.1017/S0021859607006752

Nguyen HT, Lee BW, 2006. Assessment of rice leaf growth and nitrogen status by hyperspectral canopy reflectance and partial least square regression. Eur J Agron 24: 349-356. https://doi.org/10.1016/j.eja.2006.01.001

Norouzi M, Ayoubi S, Jalalian A, Khademi H, Dehghani AA, 2010. Predicting rainfed wheat quality and quantity by artificial neural network using terrain and soil characteristics. Acta Agric Scand - B Soil Plant Sci 60(4): 341-352. https://doi.org/10.1080/09064710903005682

Patel R, Prasher S, Bonnell R, Boughton R, 2002. Development of comprehensive soil salinity index. J Irrig Drain Eng 128: 185-188. https://doi.org/10.1061/(ASCE)0733-9437(2002)128:3(185)

Rajpar I, Khanif YM, Soomro FM, Suthar JK, 2006. Effect of NaCl salinity on the growth and yield of Inqlab wheat (Triticum aestivum L.) variety. Am J Plant Physiol 1: 34-40. https://doi.org/10.3923/ajpp.2006.34.40

Rogers ME, 2002. Irrigating perennial pasture with saline water: Effects on soil chemistry, pasture production and composition. Aust J Exp Agric 42: 265-272. https://doi.org/10.1071/EA00128

Shaibu AS, Adnan AA, 2015. Predicting grain yield of maize using drought tolerance traits. Afr J Agric Res 10(33): 3332-3337. https://doi.org/10.5897/AJAR2015.9561

Shamsi K, Petrosyan M, Noor-Mohammadi G, Haghparas A, Kobraee S, et al., 2011. Differential agronomic responses of bread wheat cultivars to drought stress in the west of Iran. Afr J Biotechnol 10: 2708-2715. https://doi.org/10.5897/AJB10.1133

Sheikh Khozani Z, Khosravi KH, Torabi M, Mosavi A, Rezaei B, Rabczuk T, 2020. Shear stress distribution prediction in symmetric compound channels using data mining and machine learning models. Front Struct Civ Eng 14: 10971109. https://doi.org/10.1007/s11709-020-0634-3

Takahashi S, Anwar MR, 2007. Wheat grain yield, phosphorus uptake and soil phosphorus fraction after 23 years of annual fertilizer application to an Andosol. Field Crops Res 101: 160-171. https://doi.org/10.1016/j.fcr.2006.11.003

Tian Y, Xu YP, Wang G, 2018. Agricultural drought prediction using climate indices based on support vector regression in Xiangjiang River basin. Sci Total Environ 622: 710-720. https://doi.org/10.1016/j.scitotenv.2017.12.025

Vapnik V (ed), 1995. The nature of statistical learning theory. Springer, NY. https://doi.org/10.1007/978-1-4757-2440-0

Wilson A, Hemalatha N, Sukumar R, 2021. Computational prediction model for pepper yield prediction using support vector regression. AgriRxiv 10310468. https://doi.org/10.31220/agriRxiv.2021.00069

Wold S, Sjostrom M, Eriksson L, 2001. PLS-regression: A basic tool of chemometrics. Chemometr Intell Lab Syst 58: 109-130. https://doi.org/10.1016/S0169-7439(01)00155-1

Yang Y, Li N, Wu Y, Liu B, Li S, Tao L, et al., 2022. Key phenotypes related to wheat grain yield in a two-site multi-cultivar test. Agron J 114(5): 2874-2885. https://doi.org/10.1002/agj2.21098

Zhang H, Chen J, Li R, Deng Z, Zhang K, Liu B, Tian J, 2016a. Conditional QTL mapping of three yield components in common wheat (Triticum aestivum L.). Crop J 4: 220-228. https://doi.org/10.1016/j.cj.2016.01.007

Zhang PP, Zhou XX, Wang ZX, Mao W, Li WX, Yun F, et al., 2020. Using HJ-CCD image and PLS algorithm to estimate the yield of field-grown winter wheat. Sci Rep 10: 5173. https://doi.org/10.1038/s41598-020-62125-5

Zhang Y, Xu W, Wang W, Dong H, Qi X, Zhao M, et al., 2016b. Progress in genetic improvement of grain yield and related physiological traits of Chinese wheat in Henan Province. Field Crops Res 199: 117-128. https://doi.org/10.1016/j.fcr.2016.09.022

Published
2023-02-01
How to Cite
BEHPOURIA., FAROKHZADEHS., ZINATIZ., & KHOSRAVIZ. (2023). Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity . Spanish Journal of Agricultural Research, 21(1), e0901. https://doi.org/10.5424/sjar/2023211-19835
Section
Plant production (Field and horticultural crops)