AN ARITHMETIC MEAN OF INFORMATION GAIN AND CORRELATION RATIO BASED DECISION TREE ALGORITHM FOR ACCIDENT DATASET MINING: A CASE STUDY OF ACCIDENT DATASET OF GOMBE – NUMAN –YOLA HIGH WAY, NIGERIA

Published 30 jun 2019 •  vol 127  • 


Authors:

 

B. Z. Yahaya, Mathematics & Computer Science Department, Federal University of Kashere, Gombe State, Nigeria
L. J. Muhammad, Mathematics & Computer Science Department, Federal University of Kashere, Gombe State, Nigeria
N. Abdulganiyyi, Mathematics & Computer Science Department, Federal University of Kashere, Gombe State, Nigeria
F. S. Ishaq, Mathematics & Computer Science Department, Federal University of Kashere, Gombe State, Nigeria
Y. Atomsa, Mathematics & Computer Science Department, Federal University of Kashere, Gombe State, Nigeria

Abstract:

 

Road traffic accident datasets have large number of attributes with different data types and district values. As such applying information gain based decision tree data mining algorithms would not give good accuracy which may affect hidden patterns or useful knowledge to uncover from the dataset. This study proposed An Arithmetic Mean of Information Gain and Correlation Ratio Based Decision Tree data mining algorithm which addressed the biasness and improve the accuracy of Information Gain Based Decision tree data mining algorithms. The proposed algorithm was demonstrated using road accident dataset of Gombe – Numan –Yola High Way, Nigeria and gave 93.29 % accuracy against Information Gain Decision tree algorithm which gave 74.93% accuracy. The proposed algorithm minimized the biasness disadvantage of the information Gain of decision tree based algorithm for datasets with large number of attributes with different data types and district values.

Keywords:

 

Data mining, Algorithm, information gain, correlation ratio, arithmetic mean, dataset

References:

 

[1] Murata, N., Yoshizawa S. and Amari, S. I. Network Information Criterion-detemining the Number of Hidding Units for an Artificial Neural Network Model, 5, 1994, pp 865-872.
[2] Han, J., Kamber, M. and Pei, J. “Data Mining Concept and Techniques”. Morgan Koufman, 2006.
[3] John G. H. and Langley, P. “Estimating Continuous Distribution in Beryesian Classifier”, 1995, pp.338-345.
[4] Domingos, P. and Pazzani, M. “On the Optimality of the Simple Bayesian Classifier Under Zero –one Loss”, 2002, pp. 103-130.
[5] Vapnik, M. V. “The Nature of Statistical Learning Theory”. Springer, 1995.
[6] Roy, S., Mondal, S., Ekbal, A. and Desarkar, M. S. CRDT: Correlation Ratio Based Decision Tree Model for Healthcare Data Mining, 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 2016, pp. 36-43.
[7] Hadi, S. Y. and Nima, S. “Multi Brach Decision Tree: a New Splitting Criterion”, International Journal of Advance Science and Technology. 45, 2012, pp.91-106.
[8] Beck, J. R., Garcia, M. G. and Anagnostopoulos, G.C “A Backward Adjustng Strategy and Optimization of the C4.5 Parameters to improve C4.5’s Performance” Proceeding of the Twenty-First FLAIRS Conference, 2008.
[9] Muhammad, L. J. Sani, S., Yakubu, A., Yusuf, M. M. and Elrufai, T. A, Mohammed, I. A. and. Nuhu, A. M. “Using Decision Tree Data Mining Algorithm to Predict Causes of Road Traffic Accidents, its Prone Locations and Time along Kano –Wudil Highway”, International Journal of Database Theory and Application, 10, 2016, pp. 197-206.
[10] Muhammad, L. J., Yakubu, A. and Mohammed, I. A. ‘Data Mining Driven Approach for Predicting Causes of Road Accident’, 13th International Conference 2017- Information Technology for Sustainable Development, Nigeria Computer Society, 28, 2017, pp. 10-15.
[11] Muhammad, L. J., Mohammed, I. A. and Yakubu, A. “Social Media Analytics Driven Counterterrorism Tool to improve Intelligence Gathering towards Combating Terrorism in Nigeria” International Journal of Advanced Science and Technology, 107, 2017, pp.33-42.
[12] Muhammad, L. J., Garba, E. J., Oye N. D. and Wajiga, G. M “On the Problems of Knowledge Acquisition and Representation of Expert System for Diagnosis of Coronary Artery Disease (CAD)”, International Journal of u- and e- Service, Science and Technology, 11:3, 2018, pp. 49-58.
[13] Hussain, S., Muhammad, L. J., Atomsa, Y., Mining Social Media and DBpedia Data Using Gephi and R, Journal of Applied Computer Science & Mathematics, 12:1, 2018, pp.15-20.
[14] Hussain, S., Muhammad, L. J., Atomsa, Y. and Mohammed I. A., Performance Evaluation of Various Data Mining Algorithms on Road Traffic Accident Dataset, Information and Communication Technology for Intelligent Systems Proceedings of ICTIS, 106(1), 2018.
[15] Yahaya, B. Z, Muhammad, L. J, Abdulganiyyu, N., Ishaq F. S and Atomsa, Y. An Improved C4.5 Algorithm Using L’Hospital Rule for Large Dataset, Indian Journal of Science and Technology, 11:47, 2018.

Citations:

 

APA:
Yahaya, B. Z., Muhammad, L. J., Abdulganiyyi, N., Ishaq, F. S., & Atomsa, Y. (2019). An Arithmetic Mean of Information Gain and Correlation Ratio Based Decision Tree Algorithm for Accident Dataset Mining: A Case Study of Accident Dataset of Gombe – Numan –Yola High Way, Nigeria. International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, 127, 51-58. doi: 10.33832/ijast.2019.127.05.

MLA:
Yahaya, B. Z., et al. “An Arithmetic Mean of Information Gain and Correlation Ratio Based Decision Tree Algorithm for Accident Dataset Mining: A Case Study of Accident Dataset of Gombe – Numan –Yola High Way, Nigeria.” International Journal of Advanced Science and Technology, ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 127, 2019, pp. 51-58. IJAST, http://article.nadiapub.com/IJAST/Vol127/5.html.

IEEE:
[1] B. Z. Yahaya, L. J. Muhammad, N. Abdulganiyyi, F. S. Ishaq and Y. Atomsa, “An Arithmetic Mean of Information Gain and Correlation Ratio Based Decision Tree Algorithm for Accident Dataset Mining: A Case Study of Accident Dataset of Gombe – Numan –Yola High Way, Nigeria.” International Journal of Advanced Science and Technology (IJAST), ISSN: 2005-4238(Print); 2207-6360 (Online), NADIA, vol. 127, pp. 51-58, Jun. 2019.