ENSEMBLE LEARNING DENGAN METODE SMOTEBAGGING PADA KLASIFIKASI DATA TIDAK SEIMBANG
Abstract
Unbalanced data classification is a crucial problem in the field of machine learning and data mining. Data imbalances have a poor impact on classification results where minority classes are often misclassified as a majority class. Conventional machine learning algorithms are not equipped with the ability to work on unbalanced data, so the performance of conventional algorithms is always not optimal. In this study, ensemble learning using SMOTEBagging method was applied to classify 11 unbalanced datasets. SMOTEBagging performance is also compared with three types of conventional classification algorithms namely SVM, k-NN, and C4.5. By applying the 5 cross-validation scheme, the AUC value generated by SMOTEBagging is higher at 10 datasets. The mean values of the lowest to highest AUC were obtained by SVM, k-NN, C4.5 and SMOTEBagging algorithms with values 0.638, 0.742, 0.770 and 0.895. By applying Friedman test it was found that the performance of AUC SMOTEBagging differed significantly with the other three conventional methods SVM, k-NN and C4.5
ENSEMBLE LEARNING DENGAN METODE SMOTEBagging PADA KLASIFIKASI DATA TIDAK SEIMBANG
References
[2] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,†ArXiv Prepr. ArXiv171005381, 2017.
[3] C. Zhang, Y. Chen, X. Liu, and X. Zhao, “Abstention-SMOTE: An over-sampling approach for imbalanced data classification,†in Proceedings of the 2017 International Conference on Information Technology, 2017, pp. 17–21.
[4] G. Y. Wong, F. H. Leung, and S.-H. Ling, “A Hybrid Evolutionary Preprocessing Method for Imbalanced Datasets,†Inf. Sci., 2018.
[5] Q. Gu, X.-M. Wang, Z. Wu, B. Ning, and C.-S. Xin, “An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification,†J Dig Inf Manag, vol. 14, no. 2, pp. 92–103, 2016.
[6] A. Mishra and U. S. Reddy, “A comparative study of customer churn prediction in telecom industry using ensemble based classifiers,†in Inventive Computing and Informatics (ICICI), International Conference on, 2017, pp. 721–725.
[7] B. Karlik, A. Yibre, and K. Barış, Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility, vol. 3. 2016.
[8] R. Pruengkarn, K. W. Wong, and C. C. Fung, “Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine,†in International Conference on Neural Information Processing, 2017, pp. 67–75.
[9] A. Saifudin, “Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,†J. Softw. Eng., vol. 1, no. 1, p. 11, 2015.
[10] A. Bisri and R. S. Wahono, “Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree,†J. Intell. Syst., vol. 1, no. 1, p. 6, 2015.
[11] M. Beckmann, N. F. F. Ebecken, and B. S. L. Pires de Lima, “A KNN Undersampling Approach for Data Balancing,†J. Intell. Learn. Syst. Appl., vol. 07, no. 04, pp. 104–116, 2015.
[12] M. Moukhafi, K. E. Yassini, and S. Bri, “Mining network traffics for intrusion detection based on Bagging ensemble Multilayer perceptron with Genetic algorithm optimization,†p. 8, 2018.
[13] L. Hakim, B. Sartono, and A. Saefuddin, “Bagging Based Ensemble Classification Method on Imbalance Datasets,†vol. 6, no. 6, p. 7, 2017.
Downloads
Published
Issue
Section
License
Penulis yang menerbitkan jurnal ini menyetujui persyaratan berikut:
- Penulis memiliki hak cipta dan memberikan hak untuk publikasi pertama jurnal dengan karya yang secara simultan dilisensikan di bawah Creative Commons Attribution License yang memungkinkan orang lain untuk berbagi karya dengan pengakuan kepengarangan karya dan publikasi awal dalam jurnal ini.
- Penulis dapat membuat perjanjian kontrak tambahan yang terpisah untuk distribusi non-eksklusif versi jurnal yang diterbitkan dari karya tersebut (misalnya, mempostingnya ke repositori institusional atau menerbitkannya dalam sebuah buku), dengan pengakuan atas publikasi awalnya di jurnal ini.
- Penulis diizinkan dan didorong untuk memposting karya mereka secara online (misalnya, dalam repositori institusional atau di situs web mereka) sebelum dan selama proses pengajuan, karena dapat menyebabkan pertukaran yang produktif, serta kutipan yang lebih awal dan lebih besar dari karya yang diterbitkan (Lihat Pengaruh Akses Terbuka).