Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity

Keywords: Triticum aestivum, multivariate statistical analysis, partial least squares regression, support vector regression


Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield.

Area of study: Fars province, Iran.

Material and methods: In total, the data corresponding to 22 agronomic traits was collected from six different regions (Darab, Kavar, Marvdasht, Fasa, Lar, and Khonj) of 90 farms of Fars province, Iran as the most important wheat-growing regions. Multivariate statistical analysis (correlation, stepwise regression, and principal component analysis (PCA)) and machine learning modeling approaches, such as partial least squares regression (PLSR) and support vector regression (SVR) models, were applied to agronomic traits.

Main results: The findings, based on integrated approaches such as correlation, stepwise regression, and PCA, highlighted that number of spikes m-2, grain number spike-1, and thousand-grain weight had a major impact on the yield followed by awn length, spike length, narrow leaf herbicide, broadleaf herbicide, time to plant maturity (month), and soil salinity. Besides, PLSR with nine inputs (nine selected traits) displayed better prediction capability (R2=85 %, RMSE=0.32, MSE=0.10, and BIAS=-0.05) than that with all twenty-two input traits.

Research highlights: Integrated multivariate statistical analyses and machine learning regression methods could be a powerful tool in determining traits that have a significant impact on yield. These achievements can be considered for future breeding programs.


Download data is not yet available.


Abdel-Sattar M, Aboukarima AM, Alnahdi BM, 2021. Application of artificial neural network and support vector regression in predicting mass of ber fruits (Ziziphus mauritiana Lamk.) based on fruit axial dimensions. PLoS ONE 16(1): e0245228.

Abdi H, 2010. Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdisciplinary Reviews: Comput Stat 2: 97-106.

Abebe T, Wise RP, Skadsen RW, 2009. Comparative transcriptional profiling established the awn as the major photosynthetic organ of the barley spike while the lemma and the palea primarily protect the seed. Plant Genome 2: 247-259.

Asseng S, Turner NC, Keating BA, 2001. Analysis of water- and nitrogen-use efficiency of wheat in a Mediterranean climate. Plant Soil 233: 127-143.

Bao SD (ed), 2005. Analysis of soil agrochemistry. China Agriculture Press, Beijing, China. 495 pp.

Barikloo A, Alamdari P, Moravej K, Servati M, 2017. Prediction of irrigated wheat yield by using hybrid algorithm methods of artificial neural networks and genetic algorithm. J Water Soil 31: 715-726.

Baye A, Berihun B, Bantayehu M, Derebe B, 2020. Genotypic and phenotypic correlation and path coefficient analysis for yield and yield-related traits in advanced bread wheat (Triticum aestivum L.) lines. Cogent Food Agric 6(1): 1752603.

Botwright TL, Condon AG, Rebetzke AG, Richards RA, 2002. Field evaluation of early vigour for genetic improvement of grain wheat. Aust J Agric Res 53(10): 1137-1145.

Carrascal LM, Galván I, Gordo O, 2009. Partial least squares regression as an alternative to current regression methods used in ecology. Oikos 118(5): 681-690.

Costa C, Menesatti P, Spinelli R, 2012. Performance modelling in forest operations through partial least square regression. Silva Fenn 46(2): 241-252.

Duan L, Xie H, Li Z, Yuan H, Guo Y, Xiao X, Zhou Q, 2020. Use of partial least squares regression to identify factors controlling rice yield in Southern China. Agron J 112(3): 1502-1516.

Farokhzadeh S, Shahsavand-Hassani H, Mohammadi-Nejad GH, 2013. Evaluation of genetic diversity of primary tritipyrum, triticale and bread wheat genotypes. Iran J Agron Sci 5: 93-112.

Farokhzadeh S, Fakheri BA, Mahdinejad N, Tahmasebi S, Mirsoleimani A, Heidari B, 2020. Mapping QTLs associated with grain yield and yield-related traits under aluminum stress in bread wheat. Crop Pasture Sci 71: 429-444.

Farokhzadeh S, Fakheri BA, Zinati Z, Tahmasebi S, 2021. New selection strategies for determining the traits contributing to increased grain yield in wheat (Triticum aestivum L.) under aluminum stress. Genet Resour Crop Evol 68: 2061-2073.

Farokhzadeh S, Shahsavand-Hassani H, Zinati Z, Rajaei M, 2022. Evaluation of triticale lines compared to wheat cultivars in terms of agronomic traits using supervised learning methods and multivariate statistics. Philipp Agric Sci 105(4): 369-389.

Fassio A, Cozzolino D, 2003. Non-destructive prediction of chemical composition in sunflower seeds by near infrared spectroscopy. Indust Crops Prod 20: 321-329.

Gaju O, Reynolds MP, Sparkes DL, Foulkes MJ, 2009. Relationships between large-spike phenotype, grain number, and yield potential in spring wheat. Crop Sci 49: 961-973.

Gustavo AS, Guillermo AG, Roman AS, Daniel JM, 2022. Physiological drivers of responses of grains per m2 to environmental and genetic factors in wheat. Field Crops Res 285: 108593.

Hu Y, Wei X, Hao M, Fu W, Zhao J, Wang Z, 2018. Partial least squares regression for determining factors controlling winter wheat yield. Agron J 110: 281-292.

Jalilian A, Mondani F, Khoramivafa M, Bagheri A, 2018. Evaluation of Clipest model in simulation of winter wheat (Triticum aestivum L.) and wild oat (Avena ludoviciana L.) competition in Kermanshah. Iran J Agroeco 10: 248-266.

Leilah AA, Al-Khateeb SA, 2005. Statistical analysis of wheat yield under drought conditions. J Arid Environ 61(3): 483-496.

Lopes MS, Reynolds MP, Manes Y, Singh RP, Crossa J, Braun HJ, 2012. Genetic yield gains and changes in associated traits of CIMMYT spring bread wheat in a "historic" set representing 30 years of breeding. Crop Sci 52(3): 1123-1131.

Martens H, Martens M, 2000. Modified Jack-knife estimation of parameter uncertainty in bilinear modeling by partial least squares regression (PLSR). Food Qual Prefer 11: 5-16.

Mekonnen G, 2022. Wheat (Triticum aestivum L.) yield and yield components as influenced by herbicide application in Kaffa Zone, Southwestern Ethiopia. Int J Agron 2022: 3202931.

Meng M, Zhao C, 2015. Application of support vector machines to a small-sample prediction. Adv Petrol Explord Dev 10(2): 72-75.

Miralles DJ, Slafer GA, 2007. Sink limitations to yield in wheat: How could it be reduced? J Agric Sci 145: 139-149.

Nguyen HT, Lee BW, 2006. Assessment of rice leaf growth and nitrogen status by hyperspectral canopy reflectance and partial least square regression. Eur J Agron 24: 349-356.

Norouzi M, Ayoubi S, Jalalian A, Khademi H, Dehghani AA, 2010. Predicting rainfed wheat quality and quantity by artificial neural network using terrain and soil characteristics. Acta Agric Scand - B Soil Plant Sci 60(4): 341-352.

Patel R, Prasher S, Bonnell R, Boughton R, 2002. Development of comprehensive soil salinity index. J Irrig Drain Eng 128: 185-188.

Rajpar I, Khanif YM, Soomro FM, Suthar JK, 2006. Effect of NaCl salinity on the growth and yield of Inqlab wheat (Triticum aestivum L.) variety. Am J Plant Physiol 1: 34-40.

Rogers ME, 2002. Irrigating perennial pasture with saline water: Effects on soil chemistry, pasture production and composition. Aust J Exp Agric 42: 265-272.

Shaibu AS, Adnan AA, 2015. Predicting grain yield of maize using drought tolerance traits. Afr J Agric Res 10(33): 3332-3337.

Shamsi K, Petrosyan M, Noor-Mohammadi G, Haghparas A, Kobraee S, et al., 2011. Differential agronomic responses of bread wheat cultivars to drought stress in the west of Iran. Afr J Biotechnol 10: 2708-2715.

Sheikh Khozani Z, Khosravi KH, Torabi M, Mosavi A, Rezaei B, Rabczuk T, 2020. Shear stress distribution prediction in symmetric compound channels using data mining and machine learning models. Front Struct Civ Eng 14: 10971109.

Takahashi S, Anwar MR, 2007. Wheat grain yield, phosphorus uptake and soil phosphorus fraction after 23 years of annual fertilizer application to an Andosol. Field Crops Res 101: 160-171.

Tian Y, Xu YP, Wang G, 2018. Agricultural drought prediction using climate indices based on support vector regression in Xiangjiang River basin. Sci Total Environ 622: 710-720.

Vapnik V (ed), 1995. The nature of statistical learning theory. Springer, NY.

Wilson A, Hemalatha N, Sukumar R, 2021. Computational prediction model for pepper yield prediction using support vector regression. AgriRxiv 10310468.

Wold S, Sjostrom M, Eriksson L, 2001. PLS-regression: A basic tool of chemometrics. Chemometr Intell Lab Syst 58: 109-130.

Yang Y, Li N, Wu Y, Liu B, Li S, Tao L, et al., 2022. Key phenotypes related to wheat grain yield in a two-site multi-cultivar test. Agron J 114(5): 2874-2885.

Zhang H, Chen J, Li R, Deng Z, Zhang K, Liu B, Tian J, 2016a. Conditional QTL mapping of three yield components in common wheat (Triticum aestivum L.). Crop J 4: 220-228.

Zhang PP, Zhou XX, Wang ZX, Mao W, Li WX, Yun F, et al., 2020. Using HJ-CCD image and PLS algorithm to estimate the yield of field-grown winter wheat. Sci Rep 10: 5173.

Zhang Y, Xu W, Wang W, Dong H, Qi X, Zhao M, et al., 2016b. Progress in genetic improvement of grain yield and related physiological traits of Chinese wheat in Henan Province. Field Crops Res 199: 117-128.

How to Cite
BEHPOURIA., FAROKHZADEHS., ZINATIZ., & KHOSRAVIZ. (2023). Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity . Spanish Journal of Agricultural Research, 21(1), e0901.
Plant production (Field and horticultural crops)