Main Article Content

Abstract

Yield prediction is a very important agricultural problem. Any farmer would like to know, as soon as possible, how much yield he can expect. The problem of predicting yield production can be solved by employing data mining techniques. This study evaluated the feasibility to predict the yield at Khuzestan Province in Iran using CART and CHAID algorithms. The analyses were performed using IBM SPSS Modeler 14.2. Three cropping seasons from 125 farms were selected between 2015 and 2018. The most important attributes were selected and the average yield was classified according to a decision tree. The data was partitioned into training (70%) and testing (30%) samples. The decision tree, including nine independent variables and 29 nodes, was produced through CART method. The decision tree, including nine independent variables and 39 nodes, was produced through the CHAID method. The CART and CHAID algorithms were evaluated using linear correlation and mean absolute error (MAE). Maximum precision of model in training part relevant to CART algorithm was equal to 95%, in testing part relevant to CART algorithm was equal to 93%. According to models′ precision, the results showed that CHAID and CART models were stable and suitable for prediction of sugar beet yield.

Keywords

Yield prediction Decision tree Classification and Regression Trees (CART) Chi-squared Automatic Interaction Detection (CHAID)

Article Details

How to Cite
Monjezi, N. . (2021). The Application of the CART and CHIAD Algorithms in Sugar Beet Yield Prediction. Basrah Journal of Agricultural Sciences, 34(1), 1–13. https://doi.org/10.37077/25200860.2021.34.1.01

References

  1. Abbas, H.T.; Sahi, S.T.; Habib, A., & Ahmed, S. (2016). Laboratory evaluation of fungicides and plant extracts against strains of Colletrichum falcatum the cause of red rot of sugarcane. Pakistan Journal of Agricultural Sciences, 53, 181-186. https://doi.org/10.21162/PAKJAS/16.4655
  2. Alizadeh, S., & Malekmohamadi, S. (2014). Data mining and knowledge discovery step by step with Clementine. Khajeh Nasir University. K. N. Toosi Univ. Technology Press. Tehran: 367pp. https://doi.org/10.5772/6438
  3. Bozkir, A. S., &. Sezer, E. A. (2011). Predicting food demand in food courts by decision tree approaches. Procedia Computer Science, 3, 759-763. https://doi.org/10.1016/j.procs.2010.12.125
  4. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Chapman and Hall/CRC. New York: 368pp. https://www.routledge.com/Classification-and-Regression-Trees/Breiman-Friedman-Stone-Olshen/p/book/9780412048418
  5. Cunningham, S. J., & Holmes, G. (1999). Developing innovative applications in agriculture using data mining. In: Proceeding Southeast Asia Regional Computer Confederation Conference. https://www.cs.waikato.ac.nz/~ml/publications/1999/99SJC-GH-Innovative-apps.pdf
  6. Dolan, B. J., & Parker, G. R. (2005). Ecosystem classification in a flat, highly fragmented region of Indiana, U.S.A. Forest Ecology and Management, 219, 109-131. http://dx.doi.org/10.1016%2Fj.foreco.2005.08.045
  7. Dzeroski, S., & Drumm, D. (2003). Using regression trees to identify the habitat preference of the sea cucumber (Holothuria leucospilota) on Rarotonga, Cook Islands. Ecological Modelling, 170, 219-226. https://doi.org/10.1016/S0304-3800(03)00229-1
  8. Ekasingh, B., Ngamsomsuke, K., Letcher, R., & Spate, J. (2003). A data mining approach to simulating land use decisions: Modelling farmer’s crop choice from farm level data for integrated water resource management. In: Singh, V. & Yadava, R. (Eds.). Advances in Hydrology: Proceedings of the International Conference Water Environment Research, 175-188. Corpus ID: 202602091
  9. Etter, A., McAlpine, C., Wilson, K., Phinn, S., & Possingham, H. (2006). Regional patterns of agricultural land use and deforestation in Colombia. Agric. Agriculture, Ecosystems & Environment, 114, 369-386. https://doi.org/10.1016/j.agee.2005.11.013
  10. Evenson, C. I., Muchow, R. C., El-Swaify, S. A., & Osgood, R. V. (1987). Yield accumulation in irrigated sugarcane. I. Effect of crop age and cultivar. Agronomy Journal, 89, 638-646. https://doi.org/10.2134/agronj1997.00021962008900040016x
  11. Ferraro, D. O., Rivero, D. E., & Ghersa, C. M. (2009). An analysis of the factors that influence sugarcane yield in Northern Argentina using classification and regression trees. Field Crops Research, 112, 149-157. https://doi.org/10.1016/j.fcr.2009.02.014
  12. Folberth, C., Taiser, T., Abbaspour, K. C., Schulin, R., & Yang, H. (2012). Regionalization of a large-scale crop growth model for sub-Saharan Africa: Model setup, evaluation, and estimation of maize yields. Agriculture, Ecosystems and Environment, 151, 21-33.https://doi.org/10.1016/j.agee.2012.01.026
  13. Greenland, D. (2005). Climate variability and sugarcane yield in Louisiana. The Journal of Applied Meteorology and Climatology, 44, 1655-1666. https://doi.org/10.1175/JAM2299.1
  14. Hill, M. G.; Connolly, P. G.; Reutemann, P., & Fletcher, D. (2014). The use of data mining to assist crop protection decisions on kiwifruit in New Zealand. Computers and Electronics in Agriculture, 108, 250-257. http://dx.doi.org/10.1016/j.compag.2014.08.011
  15. Holmes, G., Cunningham, S., Dela Rue, B., & Bollen, A. (1998). Predicting apple bruising using machine learning. In: Proceedings of the Model-IT Conference. Journal Acta Horticulture, 476, 289-296. https://doi.org/10.17660/ActaHortic.1998.476.33
  16. Kass, G. V. (1980). An Exploratory technique for investigating large quantities of categorical data. Journal of Applied Statistics, 29, 119-127. https://doi.org/10.2307/2986296
  17. Khedr, A. E., Kadry, M., & Walid, G. (2015). Proposed framework for implementing data mining techniques to enhance decisions in agriculture sector. Procedia Computer Science, 65, 633-642. https://doi.org/10.1016/j.procs.2015.09.007
  18. Lark, R. M. (1997). An empirical method for describing the joint effects of environmental and other variables on crop yield. Annals of Applied Biology, 131, 141–159. https://doi.org/10.1111/j.1744-7348.1997.tb05402.x
  19. Lawes, R. A., Lawn, R. J., Wegener, M. K., & Basford, K. E. (2002a). Understanding and managing the late time of ratooning effect on cane yield. Proceedings of the Australian Society of Sugar Cane Technology, 24, 160-165. https://espace.library.uq.edu.au/view/UQ:98018
  20. Lawes, R. A., McDonald, L. M., Wegener, M. K., Basford, K. E., & Lawn, R. J. (2002b). Factors affecting cane yield and commercial cane sugar in the Tully district. Australian Journal of Experimental Agriculture, 42, 473-480. https://doi.org/10.1071/EA01020
  21. Lemon, S. C., Roy, J., Clark, M. A., Friedmann, P. D., & Rakowski, W. (2003). Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Annals of Behavioral Medicine, 26, 172-181. https://doi.org/10.1207/S15324796ABM2603_02
  22. Lisson, S. N., Inman-Bamber, N. G., Robertson, M. J., & Keating, B. A. (2005). The historical and future contribution of crop physiology and modelling research to sugarcane production systems. Field Crops Research, 92, 321-335. https://doi.org/10.1016/j.fcr.2005.01.010
  23. Luca, M. D., Abbondati, F., Pirozzi, M., & Zilioniene, D. (2016). Preliminary study on runway pavement friction decay using data mining. Transportation Research Procedia, 14, 3751- 3760. https://doi.org/10.3390/su12093516
  24. Maione, C., Batista, B.L., Campiglia A. D., & Barbosa Jr., F. (2016a). Rommel Melgaço Barbosa Classification of geographic origin of rice by data mining and inductively coupled plasma mass spectrometry. Computers and Electronics in Agriculture, 121, 101-107 https://doi.org/10.1016/j.compag.2015.11.009
  25. Maione, C., Paula, E. S., Gallimberti, M., Batista, B. L., Campiglia, A. D., Barbosa Jr. F., & Barbosa, R. M. (2016b). Comparative study of data mining techniques for the authentication of organic grape juice based on ICP-MS analysis. Expert Systems with Applications, 49, 60-73. https://doi.org/10.1016/j.eswa.2015.11.024
  26. Marshall, R. J. (2001). The use of classification and regression trees in clinical epidemiology. Journal of Clinical Epidemiology, 54, 603-609. https://doi.org/10.1016/s0895-4356(00)00344-9
  27. Meirelles, W. C. L., & Zarate, L. E. (2015). Data mining in the reduction of the number of places of experiments for plant cultivates. Computers and Electronics in Agriculture, 113, 136-147. https://doi.org/10.1016/j.compag.2015.02.006
  28. Michalski, R., & Chilausky, R. (1980). Learning by being told and learning by examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. Information Journal Policy analysis Information Systems, 4, 125-161. https://link.springer.com/article/10.1007/BF00130711
  29. Mollazade, K., Omid, M., & Arefi, A. (2012). Comparing data mining classifiers for grading raisins based on visual features. Computers and Electronics in Agriculture, 84, 124-131. https://doi.org/10.1016/j.compag.2012.03.004
  30. Navada, A.; Ansari, A.; Patil, S. & Sonkamble, B. (2011). Overview of use of decision tree algorithm sin machine learning. 2011 IEEE. Control and System Graduate Research Colloquium (ICSGRC), 37-42. https://doi.org/10.1109/ICSGRC.2011.5991826.
  31. Nelson, P. N., & Ham, G. J. (2000). Exploring the response of sugar cane to sodic and saline conditions through natural variation in the field. Field Crops Research, 66, 245-255. https://doi.org/10.1016/S0378-4290(00)00077-0
  32. Oliveira M. P. G., Bocca, F. F., & Rodrigues, L. H. A. (2017). From spreadsheets to sugar content modeling: A data mining approach. Computers and Electronics in Agriculture, 132, 14-20. https://doi.org/10.1016/j.compag.2016.11.012
  33. Papageorgiou, E. I., Markinos, A. T., & Gemtos, T. A. (2011). Fuzzy cognitive map based approach for predicting yield in cotton crop production as a basis for decision support system in precision agriculture application. Applied Soft Computing, 11, 3643-3657. https://doi.org/10.1016/j.asoc.2011.01.036
  34. Pena, J. M., Gutierrez, P. A., Hervas-Martinez, C., Six, J., Plant, R. E., & Lopez-Granados, F. (2014). Object-based image classification of summer crops with machine learning methods. Remote Sensing, 6, 5019-5041. https://doi.org/10.3390/rs6065019
  35. Perez-Quezada, J. F., Pettygrove, G. S., & Plant, E. R. (2003). Spatial-temporal analysis of yield and soil factors in two four-crop-rotation fields in the Sacramento Valley. California. Agronomy Journal, 95, 676-687. https://doi.org/10.2134/agronj2003.0676
  36. Raorane, A. A., & Kulkarni, R. V. (2012). Data mining: an effective tool for yield estimation in the agricultural sector. International Journal of Emerging Trends & Technology in Computer Science, 1, 75-79.
  37. Rathod, R. R., & Garg, R. D. (2016). Regional electricity consumption analysis for consumers using data mining techniques and consumer meter reading data. International Journal of Electrical Power and Energy Systems, 78, 368-374. https://doi.org/10.1016/j.ijepes.2015.11.110
  38. Robinson, C., & Mort, N. (1997). A neural network system for the protection of citrus crops from frost damage. Computers and Electronics in Agriculture, 16, 177-187. https://doi.org/10.1016/S0168-1699(96)00037-3
  39. Rodriguez-Galiano, V., Mendes, M. P., Jose Garcia-Soldado, M., Chica-Olmo, M., & Ribeiro, L. (2014). Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain). Science of the Total Environment, 476, 189-206. https://doi.org/10.1016/j.scitotenv.2014.01.001
  40. Roel, A., Firpo, H., & Plant, R. E. (2007). Why do some farmers get higher yields? Multivariate analysis of a group of Uruguayan rice farmers. Computers and Electronics in Agriculture, 58, 78-92. https://doi.org/10.1016/j.compag.2006.10.001
  41. Salame, E. J. (2011). Applying data mining techniques to evaluate applications for agricultural loans. Ph. D. Thesis, University of Nebraska, 162pp. https://digitalcommons.unl.edu/agecondiss/10/
  42. Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., Boston, M.A., 792pp. https://dl.acm.org/doi/book/10.5555/1095618
  43. Thomas, E. (2017). An artificial neural network for real-time hardwood lumber grading. Computers and Electronics in Agriculture, 132, 71-75. https://doi.org/10.1016/j.compag.2016.11.018
  44. Ureta, C., González-Salazar, C., Gonzalez, E. J., Alvarez-Buylla, E. R., & Martínez-Meyer, E. (2013). Environmental and social factors account for Mexican maize richness and distribution: a data mining approach. Agriculture, Ecosystems and Environment, 179, 25-34. DOI: 10.1016/j.agee.2013.06.017
  45. Xiao, Y., Mignolet, C., Mari, J. F., & Benoit, M. (2014). Modeling the spatial distribution of crop sequences at a large regional scale using land-cover survey data: A case from France. Computers and Electronics in Agriculture, 102, 51-63. https://doi.org/10.1016/j.compag.2014.01.010