Predicting methionine and lysine contents in soybean meal and fish meal using a group method of data handling-type neural network

Artificial neural network models offer an alternative to linear regression analysis for predicting the amino acid content of feeds from their chemical composition. A group method of data handling-type neural network (GMDH-type NN), with an evolutionary method of genetic algorithm, was used to predict methionine (Met) and lysine (Lys) contents of soybean meal (SBM) and fish meal (FM) from their proximate analyses (i.e. crude protein, crude fat, crude fibre, ash and moisture). A data set with 119 data lines for Met and 116 lines for Lys was used to develop GMDH-type NN models with two hidden layers. The data lines were divided into two groups to produce training and validation sets. The data sets were imported into the GEvoM software for training the networks. The predictive capability of the constructed models was evaluated by their abilities to estimate the validation data sets accurately. A quantitative examination of goodness of fit for the predictive models was made using a number of precision, concordance and bias statistics. The statistical performance of the models developed revealed close agreement between observed and predicted Met and Lys contents for SBM and FM. The results of this study clearly illustrate the validity of GMDH-type NN models to estimate accurately the amino acid content of poultry feed ingredients from their chemical composition. Additional key words: amino acid; feed ingredients; genetic algorithm; model; poultry; proximate analysis. Abbreviations used: AA (amino acid); ANN (artificial neural network); CCC (concordance correlation coefficient); CF (crude fibre); CP (crude protein); EB (error attributable to overall bias); ED (error attributable to random disturbance); ER (error attributable to deviation of the regression slope from unity); FAT (crude fat); FM (fish meal); GMDH-type NN (group method of data handling-type neural network); MAPE (mean absolute percentage error); MSPE (mean square prediction error); MST (moisture); RMSPE (square root of the mean square prediction error); SBM (soybean meal). Citation: Mottaghitalab, M.; Nikkhah, M.; Darmani-Kuhi, H.; López, S.; France, J. (2015). Predicting methionine and lysine contents in soybean meal and fish meal using a group method of data handling-type neural network. Spanish Journal of Agricultural Research, Volume 13, Issue 1, e06-001, 8 pages. http://dx.doi.org/10.5424/sjar/2015131-5877. Received: 10 Mar 2014. Accepted: 04 Feb 2015 http://dx.doi.org/10.5424/sjar/2015131-5877. Copyright © 2015 INIA. This is an open access article distributed under the Creative Commons Attribution License (CC by 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Funding: Funding for JF was provided by the NSERC Canada Research Chairs Program. Competing interests: The authors have declared that no competing interests exist. Correspondence should be addressed to Majid Mottaghitalab: mottaghi2002@yahoo.co.uk, or Secundino López: s.lopez@unileon.es (shared corresponding authors).


Introduction
Methionine (Met) and lysine (Lys) are the two most limiting amino acids (AA) in broiler diets based on corn and soybean meal (SBM).Supplementation of broiler feeds with these AA in synthetic forms is very common in the poultry industry to improve dietary protein, reduce nitrogen excretion, and minimise the cost of feed.More profitable production systems and better carcass yields can be achieved by an adjustment of dietary protein levels.Such adjustment requires information on the levels of AA, particularly Met and 2 The present study was conducted to examine the capability of GMDH-type NNs in predicting Met and Lys contents of SBM and fish meal (FM) based on their proximate chemical composition.

Data sources
Data were taken from published literature (Roush & Cravener, 1997;Bimbo & Crowther, 1992;NRC, 1994;Fickler, 1995) reporting the necessary information on CP, crude fat, crude fibre, moisture, ash, and AA contents (all in g/100 g feed) of SBM and FM.In all cases, analytical techniques followed accepted procedures of AOAC (1990).A data set was compiled with 119 data lines for Met and 116 lines for Lys, and used to train (calibrate) and validate the GMDH-type NNs.Each data line consisted of CP, crude fat, crude fibre, ash and moisture and measured Lys or Met contents (all in % or g/100 g feed) for an individual sample.Normal distribution of each variable was verified, and the occurrence of outliers was tested using standardized Zscores (Steel & Torrie, 1981).

Model development
Detailed descriptions of GMDH-type NN terminology, development, application, and examples have been previously reported by several researchers (Farrow, 1984;Müller & Lemke, 2000;Lemke & Müller, 2003;Nariman-Zadeh et al., 2005).In this study, the variables of interest that influence Lys and Met predictions by this multi-input single-output system were the CP, crude fat, crude fibre, ash and moisture contents of the feed samples.The data lines were divided into training and validation sets.Seventy four input-output data lines for Met (39 from SBM and 35 from FM samples) and 70 input-output data lines for Lys (40 SBM and 30 FM) were randomly selected and used to train the respective network.The data sets were imported into the GEvoM software for network training (GEvoM, 2014).A genetic algorithm was deployed to find the best network architecture, and two hidden layers were considered for prediction.A population of 15 individual values with a crossover probability of 0.7, mutation probability of 0.07, and 280 generations was used to genetically design the ANN.After training, the bestperforming networks were selected based on statistical criteria, and then validated using data from samples previously not used for training and deriving the net-predicting AA levels in feed ingredients have been developed using linear regression, with an input of either crude protein (CP) (Degussa Corporation, 1990) or proximate chemical constituents of feed ingredients (Monsanto, 1986a,b,c). The National Research Council (NRC, 1994) has accepted these regression approaches for predicting AA composition.However, some of the AA prediction equations show divergence and lead to low R 2 values (<0.50) in certain cases (Degussa Corporation, 1990;Monsanto, 1986a,b,c).Since the R 2 value reflects the amount of input variability explained by the equation, a more definitive method of AA prediction is desirable.
Artificial neural networks (ANN) reflect effectively the complex relationship between ingredient composition (inputs) and nutrient levels (outputs).The ANN are applied in many fields to model and predict the behaviour of unknown systems or systems with complexity (or both) based on given input-output data.Using ANN does not require an a priori equation or model.This characteristic is potentially advantageous in modelling biological processes (Dayhoff & De Leo, 2001).The ANN predictions usually result in a closer fit to the data than is accomplished with regression analysis.As a result, this better fit usually leads to more accurate predictions (Ward Systems Group, 1993).
One ANN sub-model is the group method of data handling-type neural network (GMDH-type NN).It is a self-organizing approach by which gradually more complicated models are generated based on the evaluation of their performance on a set of multi-input, single output data pairs.The GMDH was first developed by Ivakhnenko (1971) as a multivariate analysis method for modelling and identification of complex systems.The GMDH-type NN has been used to circumvent the difficulty of having prior knowledge of the mathematical model of the process being considered.In other words, GMDH-type NN can be used to model complex systems without having specific knowledge of the systems.
The main idea of GMDH is to build an analytical function in a feed-forward network based on a quadratic node transfer function whose coefficients are obtained using linear regression procedures (Farrow, 1984).The use of such self-organizing networks has led to their successful application in a broad range of areas in engineering, science, and economics (Seginer et al., 1994;Vallejo-Cordoba et al., 1995;Roush et al., 1996).The GMDH-type NNs have been used in poultry science for the prediction of broiler performance (Faridi et al., 2011;2013b), turkey performance (Mottaghitalab et al., 2010), egg production of broiler breeder hens (Faridi et al., 2012;2013a) and true metabolizable energy content in feather meal and poultry offal meal (Ahmadi et al., 2008).

3
Neural network models to predict amino acid content of feeds

Results
The range of values for the input and predicted (Met and Lys) variables, and the pair-wise correlation matrices are shown in Table 1 for FM and SBM.In both meals, Met and Lys were positively correlated with CP.The AA contents were negatively correlated with ash in FM, and with crude fibre in SBM.
The optimal structures of the evolved two hidden layer GMDH-type NNs suggested by the genetic algorithm were found with five hidden neurons for Met and Lys in FM and for Met in SBM, and with six hidden neurons for Lys in SBM.The corresponding polynomial equations obtained are: -Met in FM: -Met in SBM: -Lys in FM: -Lys in SBM:

Statistical procedures
The predictive performance of the models developed was assessed using several measures of precision and bias between estimated (predicted) and observed (actual) values for each response variable.Two measures of precision were used: 1) the proportion of variance accounted for by the model (R 2 ), and 2) the mean square prediction error (MSPE), calculated as: where O i is the observed value, and P i is the predicted value (Bibby & Toutenburg, 1977).Square root of the MSPE (RMSPE), expressed as a percentage (or proportion) of the observed mean, gives an estimate of the overall prediction error.The MSPE was decomposed into random (disturbance) error (ED), error attributable to deviation of the regression slope from unity (ER), and error attributable to overall bias (EB).Concordance correlation coefficient (CCC) was calculated to evaluate the precision and accuracy of predicted vs. observed values for the models (Lin, 1989).The CCC estimate is the product of two components: 1) the Pearson correlation coefficient, which is a measure of precision (deviation of observations from the best fit line), and 2) a bias correction factor, which indicates how far the regression line deviates from the line of unity (accuracy).Location shift relative to the scale (μ=squared difference of the means relative to the product of both standard deviations) is also reported, where a negative value indicates over-prediction and a positive value indicates under-prediction of observed values by the model.Other measures of bias employed were mean absolute percentage error (MAPE) and average bias (Oberstone, 1990), calculated as: , and agreement (accuracy) between observed and predicted values, with a coefficient of regression (slope) between both variables that was in all cases close to unity.Finally, all the statistics related to bias in the predictions show a small deviation without substantial over-or under-estimation of the observed (reference) values.sets are depicted in Figs. 1 and 2, and relevant statistical information is given in Table 2. High R 2 and low MSPE (RMSPE was always less than 5% of the observed mean) values indicate a high degree of precision in the predictions.The high CCC values and low contribution of ER to MSPE provide measures of a close Neural network models to predict amino acid content of feeds systems has been demonstrated in relation to poultry production (Faridi et al., 2013a,b;Mottaghitalab et al., 2010).In this study, the validity of this type of ANN model was examined using a genetic algorithm method to predict Met and Lys content of SBM and FM based on their proximate analysis.It is clearly evident from Figs. 1 and 2 that the evolved networks in terms of simple polynomial equations could successfully predict the output validation data, which were not used during the training process.The resultant statistical tests re-

Discussion
ANN models offer an alternative to linear regression analysis for investigating biological systems.The advantage of using an ANN to predict an output from several input variables is that it does not require an equation or model a priori to construct the relationship between the variables, as is the case with regression analysis (Roush & Cravener, 1997).The potential of GMDH-type NNs to predict the behaviour of unknown   Cravener & Roush (2001) showed that genetic algorithm calibration of NN was superior to linear regression and to other ANN calibration approaches in predicting the Met and Lys contents of feeds.There are significant differences in the number of neurons in the hidden layers between our models and those proposed by Roush & Cravener (1997) and Cravener & Roush (1999, 2001), 5 to 6 vs. 160 to 181 vealed very close agreement between observed and predicted Met and Lys values, although accuracy of prediction was higher for SBM.Acceptance of all input variables (the five nutritional fractions) by the networks shows that the five selected input variables have some influence on the AA levels in the feed ingredients.This finding is similar to that reported by Cravener & Roush (1999) who compared three types of ANN with linear regression to predict AA levels in feed ingredients, and observed that  The number of neurons in the hidden layers of an ANN model is subject to input variables and network structure.Using too many neurons in the hidden layers can result in several problems.First, it may result in over-fitting and the ANN has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all the neurons in the hidden layers.A second problem can occur even when the training data are sufficient.An inordinately large number of neurons in the hidden layers may increase the time it takes to train the network.The amount of training time can increase to the point that it is impossible to train the ANN adequately (Heaton, 2008).
The AA composition of FM and SMB can be predicted from chemical composition using ANN.It is expected that this method could be also suitable to predict the content of truly digestible AA for poultry, although this analysis could not be performed due to the limited data available.
In conclusion, results of this study can be considered as a basis for accepting the validity of GMDH-type NN models to estimate the AA composition of poultry feed ingredients from their corresponding chemical composition with suitable performance.

Figure 1 .
Figure 1.Comparison of observed and model predicted methionine (Met) values for fish meal (FM) and soybean meal (SBM).For FM data points 1 to 35 are for the training set and 36 to 55 for the validation set, and for SBM data points 1 to 39 are for the training set and 40 to 64 for the validation set.

Figure 2 .
Figure 2. Comparison of observed and model predicted lysine (Lys) values for fish meal (FM) and soybean meal (SBM).For FM data points 1 to 30 are for the training set and 31 to 49 for the validation set, and for SBM data points 1 to 40 are for the training set and 41 to 67 for the validation set.
The validation sets consisted of 25 SBM and 20 FM lines for Met and 27 SBM and 19 FM lines for Lys.
where CP, FAT, ASH, CF and MST represent the input variables: crude protein, crude fat, ash, crude fibre and moisture, respectively.The observed and predicted values of Met and Lys for the training and validation works.

Table 2 .
Statistics of the group method of data handling-type neural network models for methionine (Met) and lysine (Lys) predictions (observed vs. predicted values for training and validation sets), for fish meal (FM) and soybean meal (SBM).
a R 2 =proportion of variance accounted for by correlation coefficient; MSPE=mean square prediction error; EB=error attributable to bias, as a percentage of total MSPE; ER=error attributable to regression, as a percentage of total MSPE; ED=Error attributable to disturbance (random), as a percentage of total MSPE; RMSPE=root mean square prediction error, expressed as a percentage (%) of the observed mean; Bias=average bias; MAPE=mean absolute percentage error (%); µ=location shift relative to the scale.

Table 1 .
Mean and range of the input and predicted variables (all values in g/100 g feed as fed), and pair-wise correlation matrices (Pearson correlation coefficients).
Fontaine et al. (2001)ne et al. (2001)observed an excellent performance of NIRS to predict the essential AA content of protein-rich feed ingredients, with relative mean deviations below 5%.The prediction of true ileal digestible AA content of protein concentrates was tested by van Kempen & Bodin (1998), with high R 2 values for digestible Met and Lys in feeds of animal origin and medium to low R 2 values for the prediction of the same AA in soybean meal.