Customer Segmentation Analysis and Churn Prediction in E-commerce Using K-Means Clustering and Random Forest: A Case Study of the Brazilian E-commerce Platform Olist

Authors

Keywords:

Customer Intelligence, Churn Prediction, K-Means Clustering, Random Forest, RFM

Abstract

The rapid growth of the e-commerce industry requires digital platforms to focus on customer retention strategies to ensure business sustainability. This study aims to integrate a customer intelligence approach through customer segmentation and loyalty risk prediction. The methods applied in this study combine unsupervised learning techniques using the K-Means algorithm and supervised learning using the Random Forest algorithm on the Olist Brazilian E-commerce dataset. The clustering process based on the Recency, Frequency, and Monetary metrics produced optimal groupings with a Silhouette Score of 0.36. Furthermore, the Random Forest model successfully predicted the potential for churn with an accuracy rate of 85.37%. The combination of these two methods significantly contributes to mapping high-risk customer segments, enabling management to formulate precise retention programs.

References

Bramer, M. (2020). Principles of data mining (4th ed.). Springer.

Garetti, M., & Taisch, M. (2019). Customer intelligence in digital commerce: A review of clustering techniques. International Journal of Information Systems, 14(2), 112–128.

Han, J., Kamber, M., & Jian, P. (2011). Data mining: Concepts and techniques (3th ed.). Morgan Kaufmann.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (2nd ed.). Springer.

Kaggle. (2018). Brazilian E-Commerce Public Dataset by Olist. Kaggle Repositories. https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

Turban, E., Outland, J., King, D., Lee, J. K., Liang, T. P., & Turban, D. C. (2018). Electronic commerce 2018: A managerial and social networks perspective. Springer.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques (4th ed.). Morgan Kaufmann.

Zhang, Y., & Chen, X. (2022). Customer churn prediction in e-commerce using random forest and grid search optimization. Journal of Big Data Analytcs, 9(1), 45–59.

Zhao, J., & Claster, W. B. (2021). Combining RFM analysis and K-means clustering for customer segmentation in online retail platforms. International Journal of Electronic Commerce Research, 22(3), 204–221.

Published

2026-06-16

How to Cite

Nuryansyah, B. A., Raihanullah, & Rasyid, O. (2026). Customer Segmentation Analysis and Churn Prediction in E-commerce Using K-Means Clustering and Random Forest: A Case Study of the Brazilian E-commerce Platform Olist. Journal of Information Systems and Business Technology, 2(3), 795-801. https://journal.jci.co.id/jisbt/article/view/525

Most read articles by the same author(s)