IMPLEMENTATION OF K-NEAREST NEIGHBOR (KNN) ALGORITHM USING RAPID MINER FOR DIABETES DISEASE PREDICTION BASED ON INDIAN PIMA DATASET

Authors

Keywords:

Data Mining, K-Nearest Neighbor, RapidMiner, Diabetes Prediction, Pima Indian Dataset

Abstract

The objective of this research is to use the publicly accessible Pima Indian dataset to use the K-Nearest Neighbor (KNN) algorithm for diabetes prediction. A straightforward yet powerful classification technique, the KNN method is particularly useful for processing medical data. RapidMiner software was utilized for this study's analysis method, which included data pre-processing, training and test data separation, and classification model validation. Numerous health indicators, including age, blood pressure, body mass index, and glucose levels, are included in the Pima Indian dataset and are utilized as predictive features. The test results demonstrate that the KNN algorithm can categorize patients with or without diabetes with a reasonably high degree of accuracy. Accuracy, precision, recall, and confusion matrix metrics were used to assess the model's performance. As a result, using KNN to this dataset may be a way to help the decision support system for diabetes early diagnosis.

References

Maryanah Safitri, dan Ardian Dwi Praba. (2024). Prediksi Penyakit Diabetes Dengan Menggunakan Algoritma C4.5. (Jurnal of Informatics) Universitas Muhammadiyah Tangerang Vol 8, No.1, January 2024, pp 74-81

Nurrika, Riskya. dan Selvira Yuliana. (2023). Penerapan Data Mining Untuk Prediksi Perilaku Pelanggan Menggunakan Multiple Linear Regression. (Jurnal Informatika dan Teknik Elektro Terapan) Vol. 11 No. 3

Gunawan, M. I. (2020).Penyakit gula darah adalah sekelompok penyakit metabolik yang ditandai dengan tingginya kadar gula darah pada seseorang yang terkena, dan bertahan dalam jangka waktu lama. (Sumber tidak lengkap - sebaiknya dilengkapi nama jurnal atau buku).

Gunawan, M. I., & Fenriana. (2023).Evaluasi variabel K pada algoritma KNN untuk prediksi penyakit diabetes. (Sumber tidak lengkap - sebaiknya dilengkapi nama jurnal atau prosiding).

Perdana, A., Sari, D. F., & Lestari, P. (2023).Studi akurasi KNN dengan nilai K bervariasi pada dataset Pima Indian Diabetes. (Sumber tidak lengkap - sebaiknya dilengkapi nama jurnal).

Arrohman, M. A., & Fatah, M. (2024).Pengaruh teknik praproses data terhadap akurasi model klasifikasi kesehatan. (Sumber tidak lengkap - sebaiknya dilengkapi nama jurnal atau prosiding).

Kementerian Kesehatan Republik Indonesia (Kemenkes RI). (2022).Data prevalensi diabetes di Indonesia. Jakarta: Kementerian Kesehatan RI. (Sebaiknya tambahkan tautan atau laporan resmi jika ada).

RapidMiner. (n.d.).RapidMiner user guide and operator reference. Retrieved from https://docs.rapidminer.com/

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002).SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953

Han, J., Pei, J., & Kamber, M. (2011).Data mining: Concepts and techniques (3rd ed.). Elsevier.

Published

2025-06-30

How to Cite

Tetta Thirza Herdyawan, Dimas Cahyo Saputra, Gabriel Carol Aldosion, Salsha Sabilla Nurhidayat, & Sukrinah. (2025). IMPLEMENTATION OF K-NEAREST NEIGHBOR (KNN) ALGORITHM USING RAPID MINER FOR DIABETES DISEASE PREDICTION BASED ON INDIAN PIMA DATASET. Journal of Information Technology and Informatics Engineering, 1(1), 36-40. https://journal.jci.co.id/jitie/article/view/111