Research Article

 

Predicting methionine and lysine contents in soybean meal and fish meal using a group method of data handling-type neural network

 

Majid Mottaghitalab

University of Guilan, Faculty of Agricultural Science, Department of Animal Science. PO Box 41635-1314, Rasht, Iran

Mohsen Nikkhah

University of Guilan, Faculty of Agricultural Science, Department of Animal Science. PO Box 41635-1314, Rasht, Iran

Hassan Darmani-Kuhi

University of Guilan, Faculty of Agricultural Science, Department of Animal Science. PO Box 41635-1314, Rasht, Iran

Secundino López

Universidad de León, Instituto de Ganadería de Montaña (CSIC-ULE), Departamento de Producción Animal. 24071 León, Spain

James France

University of Guelph Centre for Nutrition Modelling, Department of Animal and Poultry Science, Guelph ON, N1G 2W1, Canada

 

Abstract

Artificial neural network models offer an alternative to linear regression analysis for predicting the amino acid content of feeds from their chemical composition. A group method of data handling-type neural network (GMDH-type NN), with an evolutionary method of genetic algorithm, was used to predict methionine (Met) and lysine (Lys) contents of soybean meal (SBM) and fish meal (FM) from their proximate analyses (i.e. crude protein, crude fat, crude fibre, ash and moisture). A data set with 119 data lines for Met and 116 lines for Lys was used to develop GMDH-type NN models with two hidden layers. The data lines were divided into two groups to produce training and validation sets. The data sets were imported into the GEvoM software for training the networks. The predictive capability of the constructed models was evaluated by their abilities to estimate the validation data sets accurately. A quantitative examination of goodness of fit for the predictive models was made using a number of precision, concordance and bias statistics. The statistical performance of the models developed revealed close agreement between observed and predicted Met and Lys contents for SBM and FM. The results of this study clearly illustrate the validity of GMDH-type NN models to estimate accurately the amino acid content of poultry feed ingredients from their chemical composition.

Additional key words: amino acid; feed ingredients; genetic algorithm; model; poultry; proximate analysis.

Abbreviations used: AA (amino acid); ANN (artificial neural network); CCC (concordance correlation coefficient); CF (crude fibre); CP (crude protein); EB (error attributable to overall bias); ED (error attributable to random disturbance); ER (error attributable to deviation of the regression slope from unity); FAT (crude fat); FM (fish meal); GMDH-type NN (group method of data handling-type neural network); MAPE (mean absolute percentage error); MSPE (mean square prediction error); MST (moisture); RMSPE (square root of the mean square prediction error); SBM (soybean meal).

Citation: Mottaghitalab, M.; Nikkhah, M.; Darmani-Kuhi, H.; López, S.; France, J. (2015). Predicting methionine and lysine contents in soybean meal and fish meal using a group method of data handling-type neural network. Spanish Journal of Agricultural Research, Volume 13, Issue 1, e06-001, 8 pages. http://dx.doi.org/10.5424/sjar/2015131-5877.

Received: 10 Mar 2014. Accepted: 04 Feb 2015

http://dx.doi.org/10.5424/sjar/2015131-6959

Copyright © 2015 INIA. This is an open access article distributed under the Creative Commons Attribution License (CC by 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Funding: Funding for JF was provided by the NSERC Canada Research Chairs Program.

Competing interests: The authors have declared that no competing interests exist.

Correspondence should be addressed to Majid Mottaghitalab: mottaghi2002@yahoo.co.uk, or Secundino López: s.lopez@unileon.es (shared corresponding authors).


 

CONTENTS

Abstract

Introduction

Material and methods

Results

Discussion

References

IntroductionTop

Methionine (Met) and lysine (Lys) are the two most limiting amino acids (AA) in broiler diets based on corn and soybean meal (SBM). Supplementation of broiler feeds with these AA in synthetic forms is very common in the poultry industry to improve dietary protein, reduce nitrogen excretion, and minimise the cost of feed. More profitable production systems and better carcass yields can be achieved by an adjustment of dietary protein levels. Such adjustment requires information on the levels of AA, particularly Met and Lys, in feed ingredients (Mendes et al., 1997). For poultry and pigs, accurate estimation of Lys is necessary because it is the reference essential AA in ideal protein (Baker & Han, 1994). Conventional methods for the separation and analysis of Met and Lys in feedstuffs (AOAC, 1990) are slow, expensive, cumbersome, and time-consuming. These disadvantages have prompted a search for alternative methods based on the prediction of AA content from other chemical fractions or from near-infrared spectra (Kovalenko et al., 2006). Both predictive approaches require similar computational methodologies. Two quantitative methods of predicting AA levels in feed ingredients have been developed using linear regression, with an input of either crude protein (CP) (Degussa Corporation, 1990) or proximate chemical constituents of feed ingredients (Monsanto, 1986a,b,c). The National Research Council (NRC, 1994) has accepted these regression approaches for predicting AA composition. However, some of the AA prediction equations show divergence and lead to low R2 values (<0.50) in certain cases (Degussa Corporation, 1990; Monsanto, 1986a,b,c). Since the R2 value reflects the amount of input variability explained by the equation, a more definitive method of AA prediction is desirable.

Artificial neural networks (ANN) reflect effectively the complex relationship between ingredient composition (inputs) and nutrient levels (outputs). The ANN are applied in many fields to model and predict the behaviour of unknown systems or systems with complexity (or both) based on given input-output data. Using ANN does not require an a priori equation or model. This characteristic is potentially advantageous in modelling biological processes (Dayhoff & De Leo, 2001). The ANN predictions usually result in a closer fit to the data than is accomplished with regression analysis. As a result, this better fit usually leads to more accurate predictions (Ward Systems Group, 1993).

One ANN sub-model is the group method of data handling-type neural network (GMDH-type NN). It is a self-organizing approach by which gradually more complicated models are generated based on the evaluation of their performance on a set of multi-input, single output data pairs. The GMDH was first developed by Ivakhnenko (1971) as a multivariate analysis method for modelling and identification of complex systems. The GMDH-type NN has been used to circumvent the difficulty of having prior knowledge of the mathematical model of the process being considered. In other words, GMDH-type NN can be used to model complex systems without having specific knowledge of the systems.

The main idea of GMDH is to build an analytical function in a feed-forward network based on a quadratic node transfer function whose coefficients are obtained using linear regression procedures (Farrow, 1984). The use of such self-organizing networks has led to their successful application in a broad range of areas in engineering, science, and economics (Seginer et al., 1994; Vallejo-Cordoba et al., 1995; Roush et al., 1996). The GMDH-type NNs have been used in poultry science for the prediction of broiler performance (Faridi et al., 2011; 2013b), turkey performance (Mottaghitalab et al., 2010), egg production of broiler breeder hens (Faridi et al., 2012; 2013a) and true metabolizable energy content in feather meal and poultry offal meal (Ahmadi et al., 2008).

The present study was conducted to examine the capability of GMDH-type NNs in predicting Met and Lys contents of SBM and fish meal (FM) based on their proximate chemical composition.

Material and methodsTop

Data sources

Data were taken from published literature (Roush & Cravener, 1997; Bimbo & Crowther, 1992; NRC, 1994; Fickler, 1995) reporting the necessary information on CP, crude fat, crude fibre, moisture, ash, and AA contents (all in g/100 g feed) of SBM and FM. In all cases, analytical techniques followed accepted procedures of AOAC (1990). A data set was compiled with 119 data lines for Met and 116 lines for Lys, and used to train (calibrate) and validate the GMDH-type NNs. Each data line consisted of CP, crude fat, crude fibre, ash and moisture and measured Lys or Met contents (all in % or g/100 g feed) for an individual sample. Normal distribution of each variable was verified, and the occurrence of outliers was tested using standardized Z-scores (Steel & Torrie, 1981).

Model development

Detailed descriptions of GMDH-type NN terminology, development, application, and examples have been previously reported by several researchers (Farrow, 1984; Müller & Lemke, 2000; Lemke & Müller, 2003; Nariman-Zadeh et al., 2005). In this study, the variables of interest that influence Lys and Met predictions by this multi-input single-output system were the CP, crude fat, crude fibre, ash and moisture contents of the feed samples. The data lines were divided into training and validation sets. Seventy four input-output data lines for Met (39 from SBM and 35 from FM samples) and 70 input-output data lines for Lys (40 SBM and 30 FM) were randomly selected and used to train the respective network. The data sets were imported into the GEvoM software for network training (GEvoM, 2014). A genetic algorithm was deployed to find the best network architecture, and two hidden layers were considered for prediction. A population of 15 individual values with a crossover probability of 0.7, mutation probability of 0.07, and 280 generations was used to genetically design the ANN. After training, the best-performing networks were selected based on statistical criteria, and then validated using data from samples previously not used for training and deriving the networks. The validation sets consisted of 25 SBM and 20 FM lines for Met and 27 SBM and 19 FM lines for Lys.

Statistical procedures

The predictive performance of the models developed was assessed using several measures of precision and bias between estimated (predicted) and observed (actual) values for each response variable. Two measures of precision were used: 1) the proportion of variance accounted for by the model (R2), and 2) the mean square prediction error (MSPE), calculated as:

where Oi is the observed value, and Pi is the predicted value (Bibby & Toutenburg, 1977). Square root of the MSPE (RMSPE), expressed as a percentage (or proportion) of the observed mean, gives an estimate of the overall prediction error. The MSPE was decomposed into random (disturbance) error (ED), error attributable to deviation of the regression slope from unity (ER), and error attributable to overall bias (EB).

Concordance correlation coefficient (CCC) was calculated to evaluate the precision and accuracy of predicted vs. observed values for the models (Lin, 1989). The CCC estimate is the product of two components: 1) the Pearson correlation coefficient, which is a measure of precision (deviation of observations from the best fit line), and 2) a bias correction factor, which indicates how far the regression line deviates from the line of unity (accuracy). Location shift relative to the scale (μ=squared difference of the means relative to the product of both standard deviations) is also reported, where a negative value indicates over-prediction and a positive value indicates under-prediction of observed values by the model. Other measures of bias employed were mean absolute percentage error (MAPE) and average bias (Oberstone, 1990), calculated as:

ResultsTop

The range of values for the input and predicted (Met and Lys) variables, and the pair-wise correlation matrices are shown in Table 1 for FM and SBM. In both meals, Met and Lys were positively correlated with CP. The AA contents were negatively correlated with ash in FM, and with crude fibre in SBM.

Table 1. Mean and range of the input and predicted variables (all values in g/100 g feed as fed), and pair-wise correlation matrices (Pearson correlation coefficients).


The optimal structures of the evolved two hidden layer GMDH-type NNs suggested by the genetic algorithm were found with five hidden neurons for Met and Lys in FM and for Met in SBM, and with six hidden neurons for Lys in SBM. The corresponding polynomial equations obtained are:

  1. Met in FM:
  2. Met in SBM:
  3. Lys in FM:
  4. Lys in SBM:

where CP, FAT, ASH, CF and MST represent the input variables: crude protein, crude fat, ash, crude fibre and moisture, respectively. The observed and predicted values of Met and Lys for the training and validation sets are depicted in Figs. 1 and 2, and relevant statistical information is given in Table 2. High R2 and low MSPE (RMSPE was always less than 5% of the observed mean) values indicate a high degree of precision in the predictions. The high CCC values and low contribution of ER to MSPE provide measures of a close agreement (accuracy) between observed and predicted values, with a coefficient of regression (slope) between both variables that was in all cases close to unity. Finally, all the statistics related to bias in the predictions show a small deviation without substantial over- or under-estimation of the observed (reference) values.

Figure 1. Comparison of observed and model predicted methionine (Met) values for fish meal (FM) and soybean meal (SBM). For FM data points 1 to 35 are for the training set and 36 to 55 for the validation set, and for SBM data points 1 to 39 are for the training set and 40 to 64 for the validation set.

Figure 2. Comparison of observed and model predicted lysine (Lys) values for fish meal (FM) and soybean meal (SBM). For FM data points 1 to 30 are for the training set and 31 to 49 for the validation set, and for SBM data points 1 to 40 are for the training set and 41 to 67 for the validation set.

Table 2. Statistics of the group method of data handling-type neural network models for methionine (Met) and lysine (Lys) predictions (observed vs. predicted values for training and validation sets), for fish meal (FM) and soybean meal (SBM).


DiscussionTop

ANN models offer an alternative to linear regression analysis for investigating biological systems. The advantage of using an ANN to predict an output from several input variables is that it does not require an equation or model a priori to construct the relationship between the variables, as is the case with regression analysis (Roush & Cravener, 1997). The potential of GMDH-type NNs to predict the behaviour of unknown systems has been demonstrated in relation to poultry production (Faridi et al., 2013a,b; Mottaghitalab et al., 2010). In this study, the validity of this type of ANN model was examined using a genetic algorithm method to predict Met and Lys content of SBM and FM based on their proximate analysis. It is clearly evident from Figs. 1 and 2 that the evolved networks in terms of simple polynomial equations could successfully predict the output validation data, which were not used during the training process. The resultant statistical tests revealed very close agreement between observed and predicted Met and Lys values, although accuracy of prediction was higher for SBM.

Acceptance of all input variables (the five nutritional fractions) by the networks shows that the five selected input variables have some influence on the AA levels in the feed ingredients. This finding is similar to that reported by Cravener & Roush (1999) who compared three types of ANN with linear regression to predict AA levels in feed ingredients, and observed that a general regression ANN was superior to linear regression in predicting the AA contents of a number of feed ingredients. Cravener & Roush (2001) showed that genetic algorithm calibration of NN was superior to linear regression and to other ANN calibration approaches in predicting the Met and Lys contents of feeds. There are significant differences in the number of neurons in the hidden layers between our models and those proposed by Roush & Cravener (1997) and Cravener & Roush (1999, 2001), 5 to 6 vs. 160 to 181 neurons, depending on AA and feed ingredients. Other approaches, such as near-infrared reflectance spectroscopy (NIRS) have been used for the prediction of AA content of feedingstuffs. Accuracy of the prediction of the amount of AA added to compound feedingstuffs for different animal species (Pérez Marín et al., 2004) was medium for Met (R2=0.77) and limited for Lys (R2=0.58). On the contrary, Fontaine et al. (2001) observed an excellent performance of NIRS to predict the essential AA content of protein-rich feed ingredients, with relative mean deviations below 5%. The prediction of true ileal digestible AA content of protein concentrates was tested by van Kempen & Bodin (1998), with high R2 values for digestible Met and Lys in feeds of animal origin and medium to low R2 values for the prediction of the same AA in soybean meal.

The number of neurons in the hidden layers of an ANN model is subject to input variables and network structure. Using too many neurons in the hidden layers can result in several problems. First, it may result in over-fitting and the ANN has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all the neurons in the hidden layers. A second problem can occur even when the training data are sufficient. An inordinately large number of neurons in the hidden layers may increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to train the ANN adequately (Heaton, 2008).

The AA composition of FM and SMB can be predicted from chemical composition using ANN. It is expected that this method could be also suitable to predict the content of truly digestible AA for poultry, although this analysis could not be performed due to the limited data available.

In conclusion, results of this study can be considered as a basis for accepting the validity of GMDH-type NN models to estimate the AA composition of poultry feed ingredients from their corresponding chemical composition with suitable performance.


ReferencesTop

Ahmadi H, Golian A, Mottaghitalab M, Nariman-Zadeh N, 2008. Prediction model for true metabolizable energy of feather meal and poultry offal meal using group method of data handling-type neural network. Poultry Sci 87: 1909-1912. http://dx.doi.org/10.3382/ps.2007-00507.
AOAC, 1990. Official Methods of Analysis, 15th ed. Association of Official Analytical Chemists, Arlington, VA, USA.
Baker DH, Han Y, 1994. Ideal amino acid profile for chicks during the first three weeks post-hatching. Poultry Sci 73: 1441-1447. http://dx.doi.org/10.3382/ps.0731441.
Bibby J, Toutenburg H, 1977. Prediction and improved estimation in linear models. John Wiley and Sons, London, UK.
Bimbo AP, Crowther JB, 1992. Fish meal and oil: current uses. J Am Oil Chem Soc 69: 221-227. http://dx.doi.org/10.1007/BF02635890.
Cravener TL, Roush WB, 1999. Improving neural network prediction of amino acid levels in feed ingredients. Poultry Sci 78: 983-991. http://dx.doi.org/10.1093/ps/78.7.983.
Cravener TL, Roush WB, 2001. Prediction of amino acid profiles in feed ingredients: genetic algorithm calibration of artificial neural networks. Anim Feed Sci Technol 90: 131-141. http://dx.doi.org/10.1016/S0377-8401(01)00219-X.
Dayhoff JE, De Leo JM, 2001. Artificial neural networks: opening the black box. Cancer 91 (Suppl 8): 1615-1635. http://dx.doi.org/10.1002/1097-0142(20010415)91:8+<1615::AID-CNCR1175>3.0.CO;2-L
Degussa Corporation, 1990. The amino acid composition of feedstuffs. Degussa Corporation, Allendale, NJ, USA.
Faridi A, Mottaghitalab M, Darmani Kuhi H, France J, Ahamadi H, 2011. Predicting carcass energy content and composition in broilers using the group method of data handling-type neural networks. J Agric Sci 149: 249-254. http://dx.doi.org/10.1017/S002185961000105X.
Faridi A, Golian A, France J, 2012. Evaluating the egg production of broiler breeder hens in response to dietary nutrient intake from 31 to 60 weeks of age through neural network models. Can J Anim Sci 92: 473-481. http://dx.doi.org/10.4141/cjas2012-020.
Faridi A, France J, Golian A, 2013a. Neural network models for predicting early egg weight in broiler breeder hens. J App Poultry Res 22: 1-8. http://dx.doi.org/10.3382/japr.2011-00446.
Faridi A, Golian A, France J, Heravi Mousavi A, 2013b. Study of broiler chicken responses to dietary protein and lysine using neural network and response surface models. Br Poultry Sci 54: 524-530. http://dx.doi.org/10.1080/00071668.2013.803517.
Farrow SJ, 1984. The GMDH algorithm. In: Self-organizing methods in modeling: GMDH type algorithms (Farrow SJ, ed). Marcel Dekker, NY, USA. pp: 1-24.
Fickler J, 1995. The amino acid composition of feedstuffs. Degussa Corporation, Ridgefield Park, NJ, USA.
Fontaine J, Horr J, Schirmer B, 2001. Near-infrared reflectance spectroscopy enables the fast and accurate prediction of the essential amino acid contents in soy, rapeseed meal, sunflower meal, peas, fishmeal, meat meal products, and poultry meal. J Agric Food Chem 49: 57-66. http://dx.doi.org/10.1021/jf000946s.
GEvoM, 2014. GMDH-type neural network designed by an evolutionary method of modelling. Available at: http://research.guilan.ac.ir/gevom/.
Heaton J, 2008. Introduction to neural networks for Java, second edition. Heaton Res. Inc., St. Louis, MO, USA.
Ivakhnenko AG, 1971. Polynomial theory of complex systems. IEEE T Syst Man Cyb SMC-1(4): 364-378. http://dx.doi.org/10.1109/TSMC.1971.4308320.
Kovalenko IV, Rippke GR, Hurburgh CR, 2006, Determination of amino acid composition of soybeans (Glycine max) by near-infrared spectroscopy. J Agric Food Chem 54: 3485-3491. http://dx.doi.org/10.1021/jf052570u.
Lemke F, Müller JA, 2003. Medical data analysis using self-organizing data mining technologies. Syst Anal Model Sim 43: 1399-1408. http://dx.doi.org/10.1080/02329290290027337.
Lin LIK, 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45: 255-268. http://dx.doi.org/10.2307/2532051.
Mendes AA, Watkins SE, England JA, Saleh EA, Waldroup AL, Waldroup PW, 1997. Influence of dietary lysine levels and arginine:lysine ratios on performance of broilers exposed to heat or cold stress during the period of three to six weeks of age. Poultry Sci 76: 472-481. http://dx.doi.org/10.1093/ps/76.3.472.
Monsanto, 1986a. Amino acids in feed ingredients and their predictability, part I. Monsanto Nutrition Update, Vol. 4:2. Monsanto, St. Louis, MO, USA.
Monsanto, 1986b. Amino acids in feed ingredients and their predictability, part II: animal by-products. Monsanto Nutrition Update, Vol. 4:3. Monsanto, St. Louis, MO, USA.
Monsanto, 1986c. Amino acids in feed ingredients and their predictability, part III: grain-based ingredients. Monsanto Nutrition Update, Vol. 4:4. Monsanto, St. Louis, MO, USA.
Mottaghitalab M, Faridi A, Darmani-Kuhi H., France J, Ahmadi H, 2010. Predicting caloric and feed efficiency in turkeys using the group method of data handling-type neural networks. Poultry Sci 89: 1325-1331. http://dx.doi.org/10.3382/ps.2009-00490.
Müller JA, Lemke F, 2000. Self-organising data mining: an intelligent approach to extract knowledge from data. Libri Publ., Hamburg, Germany.
Nariman-Zadeh N, Darvizeh A, Jamali A, Moieni A, 2005. Evolutionary design of generalized polynomial neural networks for modeling and prediction of explosive forming process. J Mater Process Tech 164-165: 1561-1571. http://dx.doi.org/10.1016/j.jmatprotec.2005.02.020.
NRC, 1994. Nutrient requirements of poultry, 9th rev. ed. National Research Council, National Academy Press, Washington DC, USA.
Oberstone J, 1990. Management science - Concepts, insights and applications. West Publ. Co., NY, USA.
Pérez-Marín DC, Garrido-Varo A, Guerrero-Ginel JE, Gómez-Cabrera A, 2004. Near-infrared reflectance spectroscopy (NIRS) for the mandatory labelling of compound feedingstuffs: chemical composition and open-declaration. Anim Feed Sci Technol 116: 333-349. http://dx.doi.org/10.1016/j.anifeedsci.2004.05.002.
Roush WB, Kochera-Kirby Y, Cravener TL, Wideman Jr RF, 1996. Artificial neural network prediction of ascites in broilers. Poultry Sci 75: 1479-1487. http://dx.doi.org/10.3382/ps.0751479.
Roush WB, Cravener TL, 1997. Artificial neural network prediction of amino acid levels in feed ingredients. Poultry Sci 76:721-727. http://dx.doi.org/10.1093/ps/76.5.721.
Seginer I, Boulard T, Bailey BJ, 1994. Neural network models of the greenhouse climate. J Agric Eng Res 59: 203-216.http://dx.doi.org/10.1006/jaer.1994.1078.
Steel, RGD, Torrie, JH, 1981. Principles and procedures of statistics, A biometrical approach, 2nd ed. McGraw-Hill, Singapore.
Vallejo-Cordoba B, Arteaga GE, Nakai S, 1995. Predicting milk shelf-life based on artificial neural networks and headspace gas chromatographic data. J Food Sci 60: 885-888. http://dx.doi.org/10.1111/j.1365-2621.1995.tb06253.x.
Van Kempen T, Bodin JC, 1998. Near-infrared reflectance spectroscopy (NIRS) appears to be superior to nitrogen-based regression as a rapid tool in predicting the poultry digestible amino acid content of commonly used feedstuffs. Anim Feed Sci Technol 76: 139-147. http://dx.doi.org/10.1016/S0377-8401(98)00207-7.
Ward Systems Group, 1993. Neuroshell 2 user’s manual. Ward Systems Group, Inc., Frederick, MD, USA.