KLASIFIKASI DATA TIDAK SEIMBANG MENGGUNAKAN ALGORITMA SMOTE DAN k-NEAREST NEIGHBOR
Abstract
Unbalanced data classification is a crucial problem in the field of machine learning and data mining. Data imbalances have a poor impact on classification results where minority classes are often misclassified as a majority class. k-Nearest Neighbor is one of the most popular and simple classification methods but it is not equipped with the ability to work on unbalanced datasets. In this study, the Synthetic Minority Over-Sampling Technique (SMOTE) was applied to solve the class imbalance problem on the Credit Card Fraud dataset. By applying the 10-cross-validation evaluation scheme, it was found that SMOTE increases the mean of G-Mean by 53.4% to 81.0% and the mean of F-Measure by 38.7 to 81.8%
Keywords: Class imbalance, Synthetic Minority Over-sampling Technique, k-Nearest Neighbor
References
[2]R. Kothan&, “Handling class imbalance problem in miRNA dataset associated with cancer,†Bioinformation, vol. 11, no. 1, pp. 6–10, Jan 2015.
[3]Q. Wu, Y. Ye, H. Zhang, M. K. Ng, & S.-S. Ho, “ForesTexter: An efficient random forest algorithm for imbalanced text categorization,†Knowl.-Based Syst., vol. 67, pp. 105–116, Sep 2014.
[4]C. Li & S. Liu, “A comparative study of the class imbalance problem in Twitter spam detection,†Concurr. Comput. Pract. Exp., pp. n/a-n/a.
[5]Q. Gu, X.-M. Wang, Z. Wu, B. Ning, & C.-S. Xin, “An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification,†J Dig Inf Manag, vol. 14, no. 2, pp. 92–103, 2016.
[6]B. Karlik, A. Yibre, & K. Barış, Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility, vol. 3. 2016.
[7]R. Pruengkarn, K. W. Wong, & C. C. Fung, “Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine,†dalam Neural Information Processing, 2017, pp. 67–75.
[8]E. M. El Houby, N. I. Yassin, & S. Omran, “A Hybrid Approach from Ant Colony Optimization and K-nearest Neighbor for Classifying Datasets Using Selected Features,†Informatica, vol. 41, no. 4, 2017.
[9]N. V. Chawla, K. W. Bowyer, L. O. Hall, & W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,†J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
[10]N. C. Barde & M. Patole, “Classification and Forecasting of Weather using ANN, k-NN and Naïve Bayes Algorithms.â€Â
[11]W. Prachuabsupakij & P. Doungpaisan, “Matching preprocessing methods for improving the prediction of student’s graduation,†dalam Computer and Communications (ICCC), 2016 2nd IEEE International Conference on, 2016, pp. 33–37.
Downloads
Additional Files
Published
Issue
Section
License
Penulis yang menerbitkan jurnal ini menyetujui persyaratan berikut:
- Penulis memiliki hak cipta dan memberikan hak untuk publikasi pertama jurnal dengan karya yang secara simultan dilisensikan di bawah Creative Commons Attribution License yang memungkinkan orang lain untuk berbagi karya dengan pengakuan kepengarangan karya dan publikasi awal dalam jurnal ini.
- Penulis dapat membuat perjanjian kontrak tambahan yang terpisah untuk distribusi non-eksklusif versi jurnal yang diterbitkan dari karya tersebut (misalnya, mempostingnya ke repositori institusional atau menerbitkannya dalam sebuah buku), dengan pengakuan atas publikasi awalnya di jurnal ini.
- Penulis diizinkan dan didorong untuk memposting karya mereka secara online (misalnya, dalam repositori institusional atau di situs web mereka) sebelum dan selama proses pengajuan, karena dapat menyebabkan pertukaran yang produktif, serta kutipan yang lebih awal dan lebih besar dari karya yang diterbitkan (Lihat Pengaruh Akses Terbuka).